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RESULT 1 
AAE13289 

ID AAE13289 standard; protein; 652 AA. 
XX 

AC AAE13289; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Mouse sitosterolaemia susceptibility gene (SSG) protein. 
XX 

KW Mouse; sitosterolaemia susceptibility gene; SSG; atherosclerosis; 

KW sterol-related disorder; hyperlipidaemia; hypercholesterolemia; 

KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 

KW xanthoma; haemolytic anaemia; transgenic animal; chromosome 17; therapy. 

XX 

OS Mus sp. 



XX 

PN WO200179272-A2. 
XX 

PD 25-OCT-2001. 
XX 

PF 18-APR-2001; 2001WO-US012758 . 
XX 

PR 18-APR-2000; 2000US-01984 65P . 

PR 15-MAY-2000; 2000US-0204234P . 
XX 

PA (TULA-) TULARIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 
XX 

DR WPI; 2002-017598/02. 

DR N-PSDB; AAD22008. 
XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 

PT useful for screening a compound that increases the level of expression or 

PT activity of SSG polypeptide for treating sterol-related disorder. 
XX 

PS Claim 19; Fig 7; 105pp; English. 
XX 

CC The invention relates to an isolated Sitosterolaemia Susceptibility Gene 

CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 

CC binding cassette (ABC) family cholesterol transporter. SSG is useful for 

CC identifying a compound useful in the treatment or prevention of a sterol- 

CC related disorder, including sitosterolaemia, hyperlipidaemia, 

CC hypercholesterolemia, gall stones, HDL deficiency, atherosclerosis or 

CC nutritional deficiencies. SSG is also useful for treating cholesterol- 

CC associated diseases or conditions including coronary heart disease and 

CC other cardiovascular diseases, and sitosterolaemia-associated condition 

CC including arthritis, xanthomas and chronic haemolytic anaemia. SSG 

CC expression cassette is useful in the production of transgenic non-human 

CC animals. SSG genes and their homologues are useful as tools for a number 

CC of applications including diagnosing sitosterolaemia and other 

CC cardiovascular disorders, for forensics and paternity determinations, and 

CC for treating any of a large number of SSG associated diseases. The 

CC present sequence is mouse SSG protein. Mouse SSG is located on chromosome 

CC 17 

XX 

SQ Sequence 652 AA; 

Query Match 100.0%; Score 3369; DB 5; Length 652; 
Best Local Similarity 100.0%; Pred. No. l.le-313; 

Matches 652; Conservative 0; Mismatches 0; Indels 0; Gaps 0 



1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 
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1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 
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121 L RRDQ FQ D C F S YVLQ S DVFL S S LT VRET L R YT AMLALC R S S AD F YN KKVEAVMT E L S L S H 180 
I I I I I I I I I | | | | M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 



121 LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAML7^CRSSADFYNKKVEAVMTELSLSH 180 

181 VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLIAE^ 240 

| | | | | | | I M | M I I II I I I I I M I I I I I I I I I I I I I M M I I M I I I I I I I I I I I I I M 
181 VADQMI GSYNFGGI S SGERRRVSIAAQLLQDPKVMMLDEPTTGLDCKTANQI VLLLAELA 240 

241 RRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

|| | | | | | | | | | | | I I I I I I I II II I I I II I I II I I I I I I I I I I I I I M I I I I M I I I I I I 
241 RRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 

| | | | | | | I I I I I I I I I I I II I I I I I I I I II I I I I M I I I I I I I I I I I I I I I I I I 

301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 

361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

| | | | | | I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 

| | | | | | | | | | | I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 

481 VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 540 

| | M | | I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I Ill 

481 VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 540 

541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 

| | | I I I I I I I I I I I I I II II I I II I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I 
541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 

601 CAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVIFKVRDYLISR 652 

I | | | | I I || I I I I II I I I I I I I I I I I M I I I I I I I I I I MINIMI 

601 CAITQGVQFI EKTCPGATSRFTANFLI LYGFI PALVI LGI VI FKVRDYLI SR 652 



RESULT 2 
AAE31702 

ID AAE31702 standard; protein; 652 AA. 
XX 

AC AAE31702; 
XX 

DT 24-MAR-2003 (first entry) 
XX 

DE Mouse ABCG5 protein. 
XX 

KW ABC family cholesterol transporter; ABCG8; sterol-related disorder; 

KW sitosterolaemia; hyper lipidaemia; hypercholesterolemia; gall stone; 

KW HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 

KW mouse; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 

KW ABCG5 . 
XX 

OS Mus sp. 
XX 

PN WO200281691-A2. 
XX 

PD 17-OCT-2002. 
XX 
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PF 20-NOV-2001; 2001WO-US043823 . 
XX 

PR 20-NOV-2000; 2000US-0252235P . 

PR 28-NOV-2000; 2000US-0253645P . 
XX 

PA (TULA-) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH, Shan B, Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 

DR N-PSDB; AAD48880. 
XX 

PT New ABCG8 polypeptides and nucleic acids , useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyper lipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies. 

XX 

PS Claim 28; Page 74; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is mouse ABCG5 protein 
XX 

SQ Sequence 652 AA; 

Query Match 100.0%; Score 3369; DB 6; Length 652; 

Best Local Similarity 100.0%; Pred. No. l.le-313; 

Matches 652; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTE7VRHSLGVLHVSYSVSNRVGPWWNIKS 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I N I I I I I I I I I I I I I 
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CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 



120 
120 



LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRSSADFYNKKVEAVMTELSLSH 180 

I | | | | | | I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
LRRDQFQDCFSWLQSDVFLSSLTVRETLRYTAMLALCRSSADFYNKKVKAVMTELSLSH 180 



I | | | | | | I I I I I I I I I I I I II I II I II I I I I I I I I I I I M I I I I M II I I I I I I I I I I M 
VAJDQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQI VXLLAEI^ 240 

RRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

I | | | | | I I I I I II I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 

RRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 



Qy 



301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 
M I I I I II I I I I I M I I I I I I I I I I I II I I I I I I I I I I II I II I I I I I I I M I I I I I II I 
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npprMTrrLTT rz\n T RRA/TRMT.MRNKnAVTMRT.VONT.TMGT.FT.TFYTiT.RVONNTLKGAVODR 


420 




I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 M 1 1 II 1 1 
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420 


A,, 
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Db 


421 


VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 


480 
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A Q 1 
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I I I I I I I I 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 




Db 


481 


VIFSSVCYWTLGLYPEVARFGYFSAALljAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 


540 


Qy 




cry t TrcirPTRMTnFMPTPT.KTT.^YFTFOKYrrFTLVVNEFYGLNFTCGGSNTSMLNHPM 

OuJjJjIvJOuI. X X\iN X \J J_tl 1 IT X17J-jr\.J_Xj\Jj_ J- X i. yl\ 1 ^v^LilJJ V V IN l_j 1. J. V3JJ1N J_ 1 v*vju»Jiit i iji iij. i i 


600 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
I I M | M M 1 1 1 1 M II II II 1 1 M 1 1 1 M 1 1 1 1 1 I 1 II II M 1 1 1 M M 1 1 1 1 1 1 I I ■ I 




Db 


541 


SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 


600 


Qy 


601 


CAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVIFKVRDYLISR 652 






1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 
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CAI T Q GVQ F I EKT C P GAT S R FT AN FL I L YG F I PAL VI LG I VI FKVRD YL I S R 652 





RESULT 3 
AAE13308 

ID AAE13308 standard; protein; 652 AA. 
XX 

AC AAE13308; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Mouse sitosterolaemia susceptibility gene (SSG) protein variant #1. 
XX 

KW Mouse; sitosterolaemia susceptibility gene; SSG; atherosclerosis; mutein; 
KW sterol-related disorder; hyperlipidaemia; hypercholesterolemia; mutant; 
KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 
KW xanthoma; haemolytic anaemia; transgenic animal; therapy; variant. 
XX 

OS Mus sp . 
OS Synthetic. 
XX 

FH Key Location/Qualifiers 
FT Misc-dif f erence 17 

FT /note= "Wild type lie substituted with Leu" 

XX 

PN WO200179272-A2. 
XX 

PD 25-OCT-2001. 
XX 

PF 18-APR-2001; 2001WO-US012758 . 
XX 

PR 18-APR-2000; 2000US-0198465P . 
PR 15-MAY-2000; 2000US-0204234P . 
XX 

PA (TULA-) TULARIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 



XX 

DR WPI; 2002-017598/02. 
XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 

PT useful for screening a compound that increases the level of expression or 

PT activity of SSG polypeptide for treating sterol-related disorder. 
XX 

PS Disclosure; Page; 105pp; English. 
XX 

CC The invention relates to an isolated Sitosterolaemia Susceptibility Gene 

CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 

CC binding cassette (ABC) family cholesterol transporter. SSG is useful for 

CC identifying a compound useful in the treatment or prevention of a sterol- 

CC related disorder, including sitosterolaemia, hyperlipidaemia, 

CC hypercholesterolaemia, gall stones, HDL deficiency, atherosclerosis or 

CC nutritional deficiencies. SSG is also useful for treating cholesterol- 

CC associated diseases or conditions including coronary heart disease and 

CC other cardiovascular diseases, and sitosterolaemia-associated condition 

CC including arthritis, xanthomas and chronic haemolytic anaemia. SSG 

CC expression cassette is useful in the production of transgenic non-human 

CC animals. SSG genes and their homologues are useful as tools for a number 

CC of applications including diagnosing sitosterolaemia and other 

CC cardiovascular disorders, for forensics and paternity determinations, and 

CC for treating any of a large number of SSG associated diseases. The 

CC present sequence is mouse SSG protein variant obtained by replacing Ilel7 

CC with Leu. Note: The present sequence is not shown in the specification 

CC but is derived from mouse SSG protein referred as SEQ ID NO: 1 (AAE13289) 

CC and shown in figure 7 of the specification 

XX 

SQ Sequence 652 AA; 

Query Match 99.9%; Score 3367; DB 5; Length 652; 

Best Local Similarity 99.8%; Pred. No. 1.7e-313; 

Matches 651; Conservative 1; Mismatches 0; Indels 0; Gaps 0; 



Qy 



1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 



l I I I I I i i i i i i i i > i • i i i i i i i i i i iii) iii 

1 MGELPFLSPEGARGPHLNRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 




Db 



Qy 



61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 




Db 



61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 



Qy 



Db 



121 LRRDQFQDCFSYVXQSDVFLSSLTVRETLRYTAMLALCRSSADFYNKKV^VMTELSLSH 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
121 LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRSSADFYNKKVEAVMTELSLSH 180 



Qy 



Db 



181 VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELA 240 

I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I I I I I I I II I I I I I I I I 
181 VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKViyiMLDEPTTGLDCMTANQIVLLIAELA 240 



Qy 



241 RRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 




Db 



241 RRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 



Qy 



301 



FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIER7VRYLKTLPMVPFKTK 360 
I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II II I I I I I II I I I I I M I I I I I I I 



Db 


301 


FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 


360 




361 


D P P GMFG KLGVL LRRVT RN LMRNKQ AVIMRLVQN L I MGL FL I F YL L RVQNNT L KGAVQ D R 


420 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 I II 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 




Db 


361 


DPP GMFG KLGVL LRRVT RN LMRNKQAVI MRLVQN L I MGL FL I F YL L RVQNNT L KGAVQ D R 


420 




iii 


V^TJ.YOT.Vf^ATPYTf^MT.NAVNLFPMLRAVSDOF.SODGLYHK^ 

V VJxjXJ J_ J_J V \Ji \ j. n X X \jl lUli it v ii xj x i_ i 1 J-J l \ijL v iJ l^Vg/LJ »_> vj XJ x nil vv ixj i i n. x v xjii v j_j i. i. u v 


480 






1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 




Db 


421 


VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 


480 


At, 

yy 


lol 


VTF^^vrYWTT.f^T.YPFVARFGYFSAAT.T.APHT.TnF.FT.TT.VTJ.CTVONPNIVNSIVALLSI 


540 






| I I I I I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 




Db 


481 


VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 


540 


yy 


jii 


cr-T t Tr^^f^FTRWTnFMPTPT.KTT.^YFTFOKYrrFTT.VVNF.FYGT.NFTCGGSNTSMLNHPM 

iJOJjxi1uOu£ J. IMN X v^Ilil 1 JT 1 r xifVlij^JI £ 1 i y[\ J. ^^£j1xi V V1NDL X vj xji n x x \-r x lji ijjiti utii 


600 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 
I I I I M I I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


541 


SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 


600 


Qy 


601 


CAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVI FKVRDYLISR 652 








II I I I I 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


601 


CAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVI FKVRDYLISR 652 





RESULT 4 
AAU96985 

ID AAU96985 standard; protein; 652 AA. 
XX 

AC AAU96985; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Mouse ABCG5 protein. 
XX 

KW Mouse; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 
KW arteriosclerosis; heart disease; hypers terolemia; Alzheimer's disease. 
XX 

OS Mus sp. 
XX 

FH Key Location/Qualifiers 
FT Misc-dif ference 638. .652 

FT /note= "Encoded by CTAG" 

XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 
PA (PATE/) PATEL S B. 
PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 



DR N-PSDB; ABK51684. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 42; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer 1 s 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the mouse ABCG5 protein of the invention 

XX 

SQ Sequence 652 AA; 

Query Match 99.8%; Score 3363; DB 5; Length 652; 

Best Local Similarity 99.8%; Pred. No. 4e-313; 

Matches 651; Conservative 0; Mismatches 1; Indels 0; Gaps 0; 

MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 
MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

I I I I I I I I M I I I I I I II I I I M I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 
CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

LRRDQ FQDC FS YVLQS DVFL S S LT VRET LR YT AMLALCRS SAD F YNKKVEAVMT EL S L S H 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I 
LRRDQFQDCFSYVliQSDVFLSSLTVRETLRYTAMLALCRSSADFYNKKVEAVMTELSLSH 180 

VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLL7VELA 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I 
VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELA 240 

RRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I M I I I I I I I I I I I I 

RRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 



Qy 


1 


Db 


1 


Qy 


61 


Db 


61 


Qy 


121 


Db 


121 


Qy 


181 


Db 


181 


Qy 


241 


Db 


241 



301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 
I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I M I I I I I I 



Db 



301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPTVPFKTK 360 



QY 



361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 




Db 



361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 



Qy 



421 VGLLYQLVGATPYTGMLNAWLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 




Db 



421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 



QY 



481 VIFSSVCYWTLGLYPEVARFGYFSAALliAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 540 




Db 



481 VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIWSIVALLSI 540 



QY 



541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILVWEFYGLNFTCGGSNTSMLNHPM 600 




Db 



54 1 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 



Qy 



601 CAI TQGVQFI EKTCPGAT S RFTANFLI LYGFI PAL VI LGI VI FKVRDYLI S R 652 




Db 



601 CAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVI FKVRDYLI SR 652 



RESULT 5 
AAE13309 

ID AAE13309 standard; protein; 652 AA. 
XX 

AC AAE13309; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Mouse sitosterolaemia susceptibility gene (SSG) protein variant #2. 
XX 

KW Mouse; sitosterolaemia susceptibility gene; SSG; atherosclerosis; mutein; 
KW sterol-related disorder; hyperlipidaemia; hypercholesterolaemia; mutant; 
KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 
KW xanthoma; haemolytic anaemia; transgenic animal; therapy; variant. 
XX 

OS Mus sp. 
OS Synthetic. 
XX 

FH Key Location/Qualifiers 
FT Misc-dif f erence 28 

FT /note= "Wild type Gly substituted with Ala" 

XX 

PN WO200179272-A2. 
XX 

PD 25-OCT-2001. 
XX 

PF 18-APR-2001; 2001WO-US012758 . 
XX 

PR 18-APR-2000; 2000US-0198465P . 
PR 15-MAY-2000; 2000US-0204234P . 
XX 

PA (TULA-) TULARIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 



XX 

DR WPI; 2002-017598/02. 
XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 

PT useful for screening a compound that increases the level of expression or 

PT activity of SSG polypeptide for treating sterol-related disorder. 
XX 

PS Disclosure; Page; 105pp; English. 
XX 

CC The invention relates to an isolated Sitosterolaemia Susceptibility Gene 

CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 

CC binding cassette (ABC) family cholesterol transporter. SSG is useful for 

CC identifying a compound useful in the treatment or prevention of a sterol- 

CC related disorder, including sitosterolaemia, hyperlipidaemia, 

CC hypercholesterolemia, gall stones, HDL deficiency, atherosclerosis or 

CC nutritional deficiencies. SSG is also useful for treating cholesterol- 

CC associated diseases or conditions including coronary heart disease and 

CC other cardiovascular diseases, and sitos terolaemia-associated condition 

CC including arthritis, xanthomas and chronic haemolytic anaemia. SSG 

CC expression cassette is useful in the production of transgenic non-human 

CC animals. SSG genes and their homologues are useful as tools for a number 

CC of applications including diagnosing sitosterolaemia and other 

CC cardiovascular disorders, for forensics and paternity determinations, and 

CC for treating any of a large number of SSG associated diseases. The 

CC present sequence is mouse SSG protein variant obtained by replacing Gly28 

CC with Ala. Note: The present sequence is not shown in the specification 

CC but is derived from mouse SSG protein referred as SEQ ID NO: 1 (AAE13289) 

CC and shown in figure 7 of the specification 

XX 

SQ Sequence 652 AA; 

Query Match 99.8%; Score 3363; DB 5; Length 652; 

Best Local Similarity 99.8%; Pred. No. 4e-313; 

Matches 651; Conservative 0; Mismatches 1; Indels 0; Gaps 0; 



Qy 


i 


MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 


60 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 




Db 


i 


MGELPFLSPEGARGPHINRGSLSSLEQASVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 


60 


Qy 


61 


CQQKWDRQI LKDVSLYI ESGQIMCI LGS SGSGKTTLLDAI SGRLRRTGTLEGEVFVNGCE 


120 




I I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


61 


CQQKWDRQI LKDVSLYI ESGQIMCI LGSSGSGKTTLLDAI SGRLRRTGTLEGEVFVNGCE 


120 


Qy 


121 


LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRSSADFYNKKVEAVMTELSLSH 


180 






I 1 1 1 II 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 M M 1 M 1 1 1 M 




Db 


121 


LRRDQ FQ DC F S YVLQ S DVFL S S LT VRET LRYT AMLAL C RS SAD F YNKKVEAVMT E LS L S H 


180 


Qy 


181 


VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVL 


240 






I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


181 


VADQMIGSYNFGGISSGERRRVSITV^QLLQDPKVl^LDEPTTGLDCMTANQIVLLLAELA 


240 


Qy 


241 


RRDRIVI VTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 


300 




I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


241 


RRDRIVI VTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 


300 


Qy 


301 


FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 


360 



1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 



301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 

361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 42 0 

| I I M I I I I I I I I I I I I M I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I M 
361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 

I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 

481 VI FSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 540 

| | | | | | | M I I I II I I I I I I I I I I I I I I I I I I I I III M I I M I I I I M I I I I I I I I I I I I 
481 VI FS SVCYWTLGLYPEVARFGYFS7\ALIAPHLI GEFLTLVLLGI VQNPNI WS I VALLS I 54 0 

541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 

M I I I I I I M I I I I I I I I I M I I I I I I I I I I M I I I I I I I I I I I I I I I I 

541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 

601 CAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVIFKVRDYLISR 652 

I I I M I I I I I I I I I II I II I I I I I I I I M I I I I I I I I I I I I I I I I I I I II II 
601 CAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVIFKVRDYLISR 652 



RESULT 6 


AAU96986 


ID 


AAU96986 standard; protein; 652 AA. 


XX 




AC 


AAU96986; 


XX 




DT 


07-AUG-2003 (revised) 


DT 


30-JUL-2002 (first entry) 


XX 




DE 


Rat ABCG5 protein. 


XX 




KW 


Rat; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 


KW 


arteriosclerosis; heart disease; hypersterolemia; Alzheimer's disease. 


XX 




OS 


Rattus sp. 


XX 




PN 


WO200227016-A2. 


XX 




PD 


04-APR-2002. 


XX 




PF 


25-SEP-2001; 2001WO-US029859 . 


XX 




PR 


25-SEP-2000; 2000US-0235268P . 


XX 




PA 


{USSH ) US DEPT HEALTH & HUMAN SERVICES. 


PA 


(PATE/) PATEL SB. 


PA 


(DEAN/) DEAN M. 


XX 




PI 


Patel SB, Dean M; 


XX 




DR 


WPI; 2002-416483/44. 


DR 


N-PSDB; ABK51686. 


XX 




PT 


Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 



Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 

XX 

PS Example 3; Page 45; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the rat ABCG5 protein of the invention. (Updated 

CC on 07-AUG-2003 to correct OS field.) 

XX 

SQ Sequence 652 AA; 

Query Match 93.5%; Score 3150; DB 5; Length 652; 

Best Local Similarity 92.9%; Pred. No. l.le-292; 

Matches 606; Conservative 25; Mismatches 21; Indels 0; Gaps 0; 

MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

| | I I I I I I I I I I I I I I I I I I I I I I : I I I I I : I I I I I I I I I : I I : I I I II I I I I I I I I I 
MGELPFLSPEGARGPHNNRGSQSSLEEGSWGSEARHSLGVXNVSFSVSNRVGPWWNIKS 60 

CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

I I I [ I I |: I I I IN I I IE I II I I I I I I I I I I I I I I I I I I I I I IIIIMIIIMI 

CQQKWDRKI LKDVSLYI ES GQTMCI LGS SGSGKTTLLDAI S GRLRRTGTLEGEVFVNGCE 120 

L RRDQ FQ D C F S YVLQ S DVFL S S LT VRET LR YT AMLALC RS SAD F YN KKVEAVMT E L S L S H 180 
I I I I I M I I I I : I M I I M I I 1 I I I I I M I I I I I I I I I I I II : II I I I I : II I I I I I 
LRRDQFQDCVSYLLQSDVFLSSLTVRETLRYTAMIJVLRSSSADFYDKKV^VLTELSLSH 180 

VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVXLLAELA 240 

1111111:11 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Mill III 

VADQMIGNYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANHIVLLLVELA 240 

RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 
I I : I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I M I I II I I I I II I I I I I I I I I 
RRNRIVIVTIHQPRSELFHHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 
I I I I I I I I I I I M I I I I I I I I I I I I I I M::IM HIIIIIM I : I I I I I I I I I I I I 



Qy 


1 


Db 


1 


Qy 


61 


Db 


61 


Qy 


121 


Db 


121 


Qy 


181 


Db 


181 


Qy 


241 


Db 


241 


Qy 


301 


Db 


301 



Qy 


361 


Db 


361 


Qy 


421 


Db 


421 


Qy 


481 


Db 


481 


Qy 


541 


Db 


541 


Qy 


601 


Db 


601 



DPP GMFGKLGVL LRRVT RN LMRNKQAVI MRLVQN L I MGL FL I F YL L RVQNNT L KGAVQ D R 420 

= I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I M I I I I I I I I I I I I I 

NPPGMFCKLGVLLRRVTRNLMRNKQWIMRLVQNLIMGLFLIFYLLRVQNNMLKGAVQDR 420 

VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 

I I I I M I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 
VGL L YQ LVGAT P YT GMLNAVN L F PML RAVS DQ E S Q D GL YQKWQML LAYVLHAL P F S I VAT 480 

VIFSSVCYWTLGLYPEVARFGYFSA7VLLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 54 0 

| | | | | | | | I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I : I I I I I I I I I I I I I I I I 
VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGMVQNPNIVNSIVALLSI 54 0 

SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 

| | | | | | | I I I I I I : I I I I I I I I I I II I I I I I I I I I I II M I I I I I I I I I I I I I I : I : I I 
SGLLIGSGFIRNIEEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSVPNNPM 600 

CAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVI FKVRDYLISR 652 

I :: I I I : I I M I I I I I I M I I I I I I I I I III I I I I I : I : I I I I I I I I I I 
CSMTQGIQFIEKTCPGATSRFTTNFLILYSFIPTLVILGMWFKVRDYLISR 652 



RESULT 7 
AAU96984 

ID AAU96984 standard; protein; 651 AA. 
XX 

AC AAU96984; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 protein. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia ; cholesterol; 

KW arteriosclerosis; heart disease; hypersterolemia; Alzheimer's disease; 

KW chromosome 2p21. 
XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualifiers 

FT Misc-dif ference 2 . . 15 

FT /note= "Encoded by GGTCTC" 

XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 

DR N-PSDB; ABK51681. 



XX 

PT Novel maitunalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Claim 52; Page 35-36; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer ! s 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the human ABCG5 protein of the invention. This 

CC sequence is encoded by the human ABCG5 gene located on chromosome 2p21 

XX 

SQ Sequence 651 AA; 

Query Match 81.5%; Score 2744.5; DB 5; Length 651; 

Best Local Similarity 80.2%; Pred. No. 8.6e-254; 

Matches 523; Conservative 64; Mismatches 64; Indels 1; Gaps 1 



Qy 



1 MGELPFLSPEGARGPHINRGSLSSLEQGSWGTEARHSLGVXHVSYSVSNRVGPWWNIKS 60 



Db 



I | : I I • I I • I • I I I i i i i l * ' ' 

1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEP-HSLGILHASYSVSHRVRPWWDITS 59 



Qy 



61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 



I • I * l iiiiiiiiii*! i « 

60 CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRA 119 




Db 



Qy 



121 LRRDQFQDCFSWLQSDVFLSSLTVRETLRYTAMLTVLCRSSADFYNKKV^VMTELSLSH 180 



i i i ■ i I i l i > i i i i i i i i i i i • i - 

120 LRREQFQDCFS YVLQSDTLLS SLTVRETLHYTALLAI RRGNPGS FQKKVEAVMAELSLSH 179 




Db 



Qy 



181 VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELA 240 



I I I . . I l • l • Mii*iiiii i • i i i i i i ' ■ • 

180 VADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELA 239 





Db 



Qy 



241 RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 



I I : I I I I I M I I I I I I I i * - i i i * i i i i i i ■ > i ■ ■ • ■ ■ ■ ■ > ■ ' » 1 1 1 1 1 

240 RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 






Db 



Qy 



301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 
M II Ml I II M:l I I I I I I I L I 1 = I 1:1-1 I I I MINI :r| I 



Db 



300 FYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTK 359 



Qy 



361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 42 0 



I I I I II M I I I I I I ' I I I 111 II* 'll*l**''! ' 

360 DSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDR 419 





Db 



Qy 



421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 




Db 



420 VGLLYQFVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWAT 47 9 



Qy 



481 VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 540 



• I I I I I I i i i i i i • i i i I i ii iii i i i i i i 

480 MIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSI 539 




Db 



Qy 



541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 



• I • I • I I I I * I I I I I I I l I I I • I I I 1 I i i i i i i i i i r ii i • 'ii 

540 AGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPM 599 





Db 



Qy 



601 CAI TQGVQFI EKTC PGAT S RFTANFLI L YGFI PALVI LGI VI FKVRD YLI S R 652 



i i i i i • i i i i i i i i i i i i i i i i i i i i • i i ■ i i - i i i i 

600 CAFTQGIQFI EKTCPGATSRFTMNFLI LYSFI PALVI LGIWFKI RDHLI SR 651 





Db 



RESULT 8 
AAE13290 

ID AAE13290 standard; protein; 651 AA. 
XX 

AC AAE13290; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Human sitosterolaemia susceptibility gene (SSG) protein. 
XX 

KW Human; sitosterolaemia susceptibility gene; SSG; atherosclerosis; 

KW sterol-related disorder; hyperlipidaemia; hypercholesterolaemia; therapy; 

KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 

KW xanthoma; haemolytic anaemia; transgenic animal; chromosome 2p21. 

XX 

OS Homo sapiens. 
XX 

PN WO200179272-A2. 
XX 

PD 25-OCT-2001. 
XX 

PF 18-APR-2001; 2001WO-US012758 . 
XX 

PR 18-APR-2000; 2000US-0198465P . 
PR 15-MAY-2000; 2000US-0204234P . 
XX 

PA (TULA-) TULARIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 
XX 

DR WPI; 2002-017598/02. 
DR N-PSDB; AAD22009. 
XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 



PT useful for screening a compound that increases the level of expression or 

PT activity of SSG polypeptide for treating sterol-related disorder. 

XX 

PS Claim 19; Fig 8; 105pp; English. 
XX 

CC The invention relates to an isolated Sitosterolaemia Susceptibility Gene 

CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 

CC binding cassette (ABC) family cholesterol transporter. SSG is useful for 

CC identifying a compound useful in the treatment or prevention of a sterol- 

CC related disorder, including sitosterolaemia, hyperlipidaemia, 

CC hypercholesterolemia, gall stones, HDL deficiency, atherosclerosis or 

CC nutritional deficiencies. SSG is also useful for treating cholesterol- 

CC associated diseases or conditions including coronary heart disease and 

CC other cardiovascular diseases, and sitosterolaemia-associated condition 

CC including arthritis, xanthomas and chronic haemolytic anaemia. SSG 

CC expression cassette is useful in the production of transgenic non-human 

CC animals. SSG genes and their homologues are useful as tools for a number 

CC of applications including diagnosing sitosterolaemia and other 

CC cardiovascular disorders, for forensics and paternity determinations, and 

CC for treating any of a large number of SSG associated diseases. The 

CC present sequence is human SSG protein. Human SSG is located on chromosome 

CC 2p21 

XX 

SQ Sequence 651 AA; 

Query Match 81.5%; Score 2744.5; DB 5; Length 651; 

Best Local Similarity 80.2%; Pred. No. 8.6e-254; 

Matches 523; Conservative 64; Mismatches 64; Indels 1; Gaps 1; 

MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

||:| |:1 |: I Mill MM I I M M : M M M I : M M I : I I 
MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEP-HSLGILHASYSVSHRVRPWWDITS 59 

CQQKWDRQI LKDVSLYI ESGQIMCI LGS SGSGKTTLLDAI SGRLRRTGTLEGEVFVNGCE 120 
IMM M M M M I I : I I I I I M I I I M I I I M M M I : M I I I I I I I I : I I I 
CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRA 119 

LRRDQFQDCFSYVTuQSDVFLSSLTVRETLRYTAMLALCRSSADFYNKKVEAVMTELSLSH 180 

I I |: I M I I M M I I I I MMMMII MIMM I : : I I I I I I I Mill 
LRREQFQDCFS YVLQSDTLLS SLTVRETLHYTALLAI RRGNPGS FQKKVEAVMAELSLSH 179 

VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMT7\NQIVXLLAELA 240 

M I : : I I : I : I I I I : I I I I M M I I I M I I I I M : I I I I I I I II M I M M M I III 
VADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELA 239 

RRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 
I I M I M M I M I I I M M || I IMMMMM I I I I Ml I I I : I I I I I I I I I I I I I 
RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 

FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 
I I I I I I I I I I I I : I I I I I I MIMM IMM I II I : I I 1 I :: I I I I I I I I I I I I 
FYMDLT S VDTQS KEREI ET S KRVQMI ESAYKKS AI CHKTLKNI ERMKHLKTLPMVP FKTK 359 

DPP GM FGKL GVLL RRVT RN LMRN KQAVI MRLVQN L I MGL FL I FYLL RVQNNT LKGAVQ DR 420 

I | | : I I I I I II I I I I I I I : I I I III I I : I I I I I I I I I : I I I I :: I I I I I : I I I 
DSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDR 419 



Qy 


1 


Db 


1 


Qy 


61 


Db 


60 


Qy 


121 


Db 


120 


Qy 


181 


Db 


180 


Qy 


241 


Db 


240 


Qy 


301 


Db 


300 


Qy 


361 


Db 


360 



Qy 

Db 


421 
420 


VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 

MINI 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MlhlM 1 1 1 1 1 1 1 1 : 1 1 
VGL L YQ FVGAT P YTGMLN AVNL F P VL RAVS DQ E S QD GL YQ KWQMMLAYALHVL P FS WAT 


480 
479 


Qy 

Db 


481 
480 


VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 

: I I I I I 1 1 1 II 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 II 1 1 1 1 1 1 ! 1 1 : 1 1 1 1 1 1 
MIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSI 


540 
539 


Qy 


541 


SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 

• 1 • 1 • 1 i 1 1 - 1 1 1 1 1 1 1 1 1 1 1 • 1 1 1 1 1 1 1 1 t 1 1 II 1 1 1 1 1 1 1 1 1 1 1 II 1 : : 1 1 

• l • l • I I I I • i i i i i i i i i ii* imiiiii i i i i i i i i i i i i i i i i ii i' • i i 

AGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPM 


600 


Db 


540 


599 


Qy 


601 


CAI TQGVQFI EKTCPGAT SRFTAN FLI LYGFI PALVI LGI VI FKVRDYLI S R 652 

II II 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 : 1 1 : 1 1 1 1 
CAFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 




Db 


600 





RESULT 9 




AAE31704 




ID 


AAE31704 standard; protein; 651 AA. 




XX 






AC 


AAE31704; 




XX 






DT 


24-MAR-2003 {first entry) 




XX 






DE 


Human ABCG5 protein. 




XX 






KW 


ABC family cholesterol transporter; ABCG8 ; sterol-related disorder; 


KW 


si tosterolaemia ; hyperlipidaemia ; hypercholes terolaemia ; gall 


stone; 


KW 


HDL deficiency; atherosclerosis; nutritional deficiency; gene 


therapy; 


KW 


human; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 


KW 


ABCG5 . 




XX 






OS 


Homo sapiens. 




XX 






PN 


WO200281691-A2. 




XX 






PD 


17-OCT-2002. 




XX 






PF 


20-NOV-2001; 2001WO-US043823 . 




XX 






PR 


20-NOV-2000; 2000US-0252235P . 




PR 


28-NOV-2000; 2000US-0253645P . 




XX 






PA 


(TULA-) TULARIK INC. 




PA 


(TEXA ) UNIV TEXAS SYSTEM. . 




XX 






PI 


Hobbs HH, Shan B, Barnes R, Tian H; 




XX 






DR 


WPI; 2003-058548/05. 




DR 


N-PSDB; AAD48882. 




XX 






PT 


New ABCG8 polypeptides and nucleic acids , useful for treating 


sterol- 


PT 


related disorders e.g. sitosterolemia, hypercholesterolemia, 




PT 


hyperlipidemia, gall stones, HDL deficiency, atherosclerosis, 


or 


PT 


nutritional deficiencies. 




XX 







PS Claim 28; Page 78-79; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is human ABCG5 protein 
XX 

SQ Sequence 651 AA; 

Query Match 81.5%; Score 2744.5; DB 6; Length 651; 

Best Local Similarity 80.2%; Pred. No. 8.6e-254; 

Matches 523; Conservative 64; Mismatches 64; Indels 1; Gaps 1; 

MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

||:| |:| I: I :IMI llll I I I I I I : I I I I I I I : I I I I I : I I 

MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEP-HSLGILHASYSVSHRVRPWWDITS 59 

CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

I : I : I I I II I I I I I I : I I I I I I I I I I I I I I I I I I I I I I : I I I I I II 111*111 
CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRA 119 

LRRDQFQDCFSYVT^QSDVFLSSLTVRETLRYTAML^ 180 

I I I : I I I I I I I I I I I II I I I II I I I I I I I I : I I : I : : I I I I I I I I I I I I I 
LRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI*RRGNPGSFQKKVEAVMAELSLSH 179 

VADQMI GSYNFGGI S SGERRRVS IAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELA 240 
I M :: I I : I : I I I I : I I I I I I I I I I I I I I I I M I : II I I I M I I I I I I I I I : II III 
VADRLIGN YSLGGI STGERRRVS IAAQLLQDPKVMLFDEPTTGLDCMTANQI WLLVELA 239 

RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

I I : I I I : : I I I II I I I I I I I I I I I I I : : I I I : I M I I III I I I : I I I I I I I I I I I I I 
RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 

FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 

I I I I 1 I I I I I I I : i I I II I 11111:1 |:|:| I II 1 = 1111 I M I I I I I I I I I 
FYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTK 359 

DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

I I I : I I I I I I I I I I I I I I : I I I III I I : I I I I I I I I I : I : : II I :: I I I I I : I I I 
DSPGVFSKLGVLLRRWRNLVWKIAVITRLLQNLIMGLFLLFFVXRVRSNVLKGAIQDR 419 

VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 

I I I I I I I I I I I I I I I I I I II I I I : I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I : I I 
VGLL YQ FVGAT P YTGMLNAVNL F P VL RAVS DQ E S Q D GL YQ KWQMMLAYALHVL P FS WAT 479 

VI FS SVCYWTLGLYP EVARFGYFSAALLAPHLI GEFLTLVLLGI VQNPNI VNS I VALLS I 54 0 

: I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I 
MI FSSVCYWTLGLHPEV7VRFGYFSAALIAPHLI GEFLTLVLLGI VQNPNI VNSVVALLS I 539 

SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 

: I : I : I I I I : I I I I I I I I I II : I I I I I I I I I I I I I I I II I I I I I I I II I : : I I 
AGVLVGSGFLRNIOEMPI PFKI I S YFTFQKYCSEILWNEFYGLNFTCGS SNVSVTTNPM 599 



Qv 


1 


Db 


1 


QV 


61 


Db 


60 


Qv 


121 


Db 


120 


Qy 


181 


Db 


180 


Qy 


241 


Db 


240 


Qy 


301 


Db 


300 


Qy 


361 


Db 


360 


Qy 


421 


Db 


420 


Qy 


481 


Db 


480 


Qy 


541 


Db 


540 



Qy 601 CAITQGVQFIEKTCPGATSRFTANFLILYGFI PALVILGIVIFKVRDYLISR 652 

II I I I : I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I : I I : I I : I I I I 
Db 600 CAFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 



RESULT 10 
AAU96992 

ID AAU96992 standard; protein; 651 AA. 
XX 

AC AAU96992; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 mutant E146Q protein sequence. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypersterolemia; Alzheimer's disease; 

KW mutant; mutein. 
XX 

OS Homo sapiens. 

OS Synthetic. 
XX 

FH Key Location/Qualifiers 

FT Misc-dif f erence 146 

FT /note= "Wild-type Glu substituted by Gin" 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Claim 12; Page; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 



1 



CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the human ABCG5 mutant E14 6Q protein of the 

CC invention. Note: This sequence is not shown in the specification but is 

CC derived from the wild-type human ABCG5 protein (AAU96984) given on pages 

CC 35-36 of the specification 

XX 

SQ Sequence 651 AA; 

Query Match 81.4%; Score 2741.5; DB 5; Length 651; 

Best Local Similarity 80.1%; Pred. No. 1.7e-253; 

Matches 522; Conservative 65; Mismatches 64; Indels 1; Gaps 1; 

Qy 1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

| | : | I : I I : I : I I I I I I I I I I I I I I : I I I I I II : I I I I I : I I 

Db 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEP-HSLGILHASYSVSHRVRPWWDITS 59 

Qy 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

I : I : I I I I I I I I I I I : I I I I I I I I II I I I I I I I I I I I I : I I I I I II I I I : I I I 
Db 60 CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRA 119 

Qy 121 LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMl^CRSSADFYNKKV^VMTELSLSH 180 

I I I : I I I I I I I I I I I II 1111111:11 I I I : I I : I : : I I I I I I I I I I I I I 
Db 12 0 LRREQFQDCFSYVXQSDTLLSSLTVRQTLHYTALLAIRRGNPGSFQKKVEAVKAELSLSH 179 

Qy 181 V7\I)QMIGSY1TFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVXLIAEI^ 240 

I I I : : I I : I : I I I I : I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I : I I III 
Db 180 VADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELA 239 

Qy 241 RRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

I |:| I |::M I M I I I I I I I I I I I II :: I I I : I I I I I III I I I : I I I I I I I I I I I I I 
Db 240 RRNRI WXTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 

Qy 301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 

I I I I I I I I I II I : I I I I I I Mill: l:hl I II 1 = 1111 ::IIMMIIIIII 
Db 300 FYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTK 359 

Qy 361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

I 11:1 I I I I I I I I I I I I I : I I I III I I I I I I I I:: I lllhlll 

Db 360 DSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDR 419 

Qy 421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 

I I I I I I I I I I II I I I I I I I I I I I : I I II I I I I I I I I I I lllhlll M I I I I I I : I I 
Db 420 VGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWAT 479 

Qy 481 VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 540 

: I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I II : I I I I I I 
Db 480 MIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSI 539 



Qy 541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 

: I : I : I I I I : II I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I : : I I 
Db 540 AGVLVGSGFLRNIQEMPIPFKIISYFT FQKYCSEILWNEFYGLNFTCGSSNVSVTTNPM 599 



Qy 601 CAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVIFKVRDYLISR 652 

II I I I : I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I : I I : I I : I I I I 
Db 600 CAFTQGIQFI EKTCPGATSRFTMNFLI LYS FI PALVI LGI WFKI RDHLI SR 651 



RESULT 11 
AAU96990 

ID AAU96990 standard; protein; 651 AA. 
XX 

AC AAU96990; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 mutant R389H protein sequence. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypers terolemia; Alzheimer 1 s disease; 

KW mutant; mutein. 
XX 

OS Homo sapiens. 

OS Synthetic. 
XX 

FH Key Location/Qualifiers 

FT Misc-dif f erence 389 

FT /note= "Wild-type Arg substituted by His" 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US- 0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Claim 7; Page; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 



CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the human ABCG5 mutant R38 9H protein of the 

CC invention. Note: This sequence is not shown in the specification but is 

CC derived from the wild-type human ABCG5 protein (AAU96984) given on pages 

CC 35-36 of the specification 

XX 

SQ Sequence 651 AA; 

Query Match 81.3%; Score 2739.5; DB 5; Length 651; 

Best Local Similarity 80.1%; Pred. No. 2.6e-253; 

Matches 522; Conservative 64; Mismatches 65; Indels 1; Gaps 1; 

Qy 1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

| | : | | : I I : I : I I I I I I I I I I I I I I : I I I I I M : I I I I I : I I 

Db 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEP-HSLGILHASYSVSHRVRPWWDITS 59 

Qy 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

| : | : I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I : I M I I II I I I : I I I 
D b 60 CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRA 119 

Qy 121 LRRDQFQDCFSYVLQSDVFLSSLTWETLRYTAML^ 180 

I I I : I I I M I I I I I I I I I I I I I I I I I I I I I : I I : I : : I I I I I I I I II I I I 
Db 120 LRREQFQDCFSYVLQSDTLLS SLTVRETLHYTALLAI RRGNPGS FQKKVEAVMAELSLSH 179 

Qy 181 VADQMIGSYNFGGISSGERRRVSIAAQLLQDPICVMMLDEPTTGLDCMTANQIVLLLAEIA 240 

|||::||:|: I I I I : I I I I I I I I I I I I I I I I I I I : I I I I II I I I I I I I I I I : I I III 
Db 180 VADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVXLVELA 239 

Qy 241 RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

I I : I I I :: I 1 I I I I I I I I I I I I I I I I :: I I I : I I I I I III I I I : I I I I I I I I I I I I I 
Db 240 RRNR1WLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 

Qy 301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 

I | | I I M I I II I : I I I I I I I I I I I : I I : I : I I II I : I I I I :: I I I I I I I I I I I I 
Db 300 FYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTK 359 

Qy 361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

| II: I || I I I II I I I I I : I I I III :l I I I I II I !:[::! I I::l 1111 = 111 

Db 360 DSPGVFSKLGVLLRRVTRNLVRNKLAVITHLLQNLIMGLFLLFFVLRVRSNVLKGAIQDR 419 

Qy 421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 

MINI I I I I II I I I I I I I I I I I : I I I I I I I I I I I I I I 1111:111 I I I I I I I I : II 
Db 420 VGLL YQ FVGAT P YT GMLNAVNL F P VL RAVS DQ E S Q D GL YQ KWQMMLAYALHVL P F SWAT 479 



Qy 



481 



VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 540 



- I I I I 1 I I I i i i i • i 1 1 

MIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSI 539 




Db 



480 



Qy 



541 



SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 



Db 



540 




Qy 



601 



GAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVIFKVRDYLISR 652 



M i i t • i i i i i i i i i i i i i i i i ii i i i i - • ■ ■ . 

CAFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 





Db 



600 



RESULT 12 
AAU96989 

ID AAU96989 standard; protein; 651 AA. 
XX 

AC AAU96989; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 mutant R419H protein sequence. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 
KW arteriosclerosis; heart disease; hypers terolemia ; Alzheimer's disease; 
KW mutant; mutein. 
XX 

OS Homo sapiens. 
OS Synthetic. 
XX 

FH Key Location/Qualif iers 

FT Misc-dif f erence 419 

FT /note= "Wild-type Arg substituted by His" 

XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 
PA (PATE/) PATEL S B. 
PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 
PT acid encoding the polypeptide, useful for treating sitosterolemia, 
PT arteriosclerosis and heart diseases. 
XX 

PS Claim 9; Page; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 



CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer f s 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the human ABCG5 mutant R419H protein of the 

CC invention. Note: This sequence is not shown in the specification but is 

CC derived from the wild-type human ABCG5 protein (AAU96984) given on pages 

CC 35-36 of the specification 

XX 

SQ Sequence 651 AA; 

Query Match 81.3%; Score 2739.5; DB 5; Length 651; 

Best Local Similarity 80.1%; Pred. No. 2.6e-253; 

Matches 522; Conservative 64; Mismatches 65; Indels 1; Gaps 1; 

Qy 1 MGELPFLSPEGARGPHINRGSLSSLEQGSWGTEARHSLGVliHVSYSVSNRVGPWWNIKS 60 

||:| 1:1 S: I :|||| MM I I M M M I I M M M I M I M I 
Db 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEP-HSLGILHASYSVSHRVRPWWDITS 59 

Qy 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

I : I : I M M M M I I M M M M M M M I I M M M I M M I I M M M M I 

Db 60 CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRA 119 

Qy 121 LRRDQFQDCFSWLQSDVFLSSLTVT^ETLRYTAMl^CRSSADFYNKKV^vMTELSLSH 180 

Mhlllllllllllll MMMMM Ml:ll: I : : MMMI MMM 
Db 120 LRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSH 179 

Qy 181 VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELA 240 

M I :M M I : I I I I : I I I I I I I I I I I I I I I I M I : I I I I I I I I I I I I I I I I : I I Ml 
Db 180 VADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELA 239 

Qy 241 RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

M M M : M M I M M M I M M M MM M M M M Ml II I : I II I I I I I I I M I 

Db 240 RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 

Qy 301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 

I M I II II I II I : I I II I I MMMI I : I : I I II MMM :: I I I I I I II I I I I 
Db 300 FYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTK 359 

Q y 361 DPP GMFGKLGVL L RRVT RN LMRNKQAVIMRLVQN L I MGL FL I F Y L L RVQNNT L KGAVQ D R 420 

I MM M I I I I M M I I I : M I Ml M : M I I M M I : I : : M I : : I IMMM 
Db 360 DSPGVFSKLGVXLRRWRNLvT^KIAVITRLLQNLIMGLFLLFFVXRVRSNVLKGAIQDH 419 



Qy 421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 

I I I I I I I M I I I I I I I I I I I I I I : I I II I I I I I I I I I I lllhlll I I I I I I I I : I I 

Db 420 VGLL YQ FVGAT P YT GMLNAVN L F P VL RAVS DQ E S Q D GL YQ KWQMMLAYALHVL P FS WAT 479 

Qy 481 VI FS SVCYWTLGLYPEVARFGYFSAALIAPHLI GEFLTLVLLGI VQNPNI WS I VALLS I 540 

: I I I I I I I I I I M : I I I I I I I I I I II I I M I I I I I I I I I I I I I I I I I I I I I I I : I I I I M 
Db 480 MIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSI 539 

Qy 541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 

: I : I : I I I I : I I I II I I I I I I : I I I I I I I I I I I I I I I I I I II I I I I II I : : I I 
Db 540 AGVLVGSGFLRNIQEMPI PFKI I SYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPM 599 

Qy 601 CAITQGVQFIEKTCPGATSRFTANFLILYGFI PALVI LGIVI FKVRDYLI SR 652 

II I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I : I I : I I I I 

Db 600 CAFTQGI QFI EKTCPGAT S RFTMNFLI LYS FI PALVI LGI WFKI RDHLI S R 651 



RESULT 13 
AAU96993 

ID AAU96993 standard; protein; 651 AA. 
XX 

AC AAU96993; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 mutant R419P protein sequence. 
XX 

KW Human; ABCG5 ; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 
KW arteriosclerosis; heart disease; hyper sterolemia; Alzheimer 1 s disease; 
KW mutant; mutein. 
XX 

OS Homo sapiens. 
OS Synthetic. 
XX 

FH Key Location/Qualifiers 
FT Misc-dif ference 419 

FT /note= "Wild-type Arg substituted by Pro" 

XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 
PA (PATE/) PATEL S B. 
PA (DEAN/ ) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 
PT acid encoding the polypeptide, useful for treating sitosterolemia, 
PT arteriosclerosis and heart diseases. 



PS Claim 10; Page; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer f s 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the human ABCG5 mutant R419P protein of the 

CC invention. Note: This sequence is not shown in the specification but is 

CC derived from the wild-type human ABCG5 protein (AAU96984) given on pages 

CC 35-36 of the specification 

XX 

SQ Sequence 651 AA; 

Query Match 81.3%; Score 2737.5; DB 5; Length 651; 

Best Local Similarity 80.1%; Pred. No. 4e-253; 

Matches 522; Conservative 64; Mismatches 65; Indels 1; Gaps 1 

MGELPFLSPEGAKGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 
I I : I I : I I : I : I I I I I I I I I I I I I I : I I I I I I I : I I I I I : I I 



|:|:| I I I M I I i 11:1 I I i I I I I I I I II I I I I I I I I 1:1 M I I II 111 = 111 



I I I : I II I I II I I I I I I I I I I I I I I I I I I I : I I : I : - I I I I I I I I I I I I I 



I I : I : I I I I : I I I I I I I I I I I I I I I I I I I : I I I I II I I I I I I I I I I : M III 



I I : I I I :: I I I I I I I I I I I I I I I I I I : : I I I : I I I I I III I I I : I I I I I I I I I I I I I 



I I I I I I I I I I I I : I I I I I I 11111 = 1 I:hl I II I : I I I I :: I I I I I I I I I I I 



Qy 


1 


Db 


1 


Qy 


61 


Db 


60 


Qy 


121 


Db 


120 


Qy 


181 


Db 


180 


Qy 


241 


Db 


240 


Qy 


301 


Db 


300 



Qy 361 DPP GMFGKL GVL L RRVT RN LMRN KQAVI MRL VQN L I MGL FL I FY L LRVQNNT LKGAVQ D R 420 

I II:! I I I I I I I I I I I I I : I I I III I 1:1 I I I I I I I l:|::l I l::i Nihil 

Db 360 DSPGVFSKLGVLLRRWRNLVRNKIiAVITRLLQNLIMGLFLLFFVIjRVRSNVLKGAIQDP 419 

Qy 421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVXHVLPFSVIAT 4 80 

II II I I I I I I II I I I I I I I I I I I : I I I I II I I I I I I I I lllhlll lllllllhM 

Db 420 VGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWAT 479 

Qy 481 VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 540 

: I M I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I 
Db 480 MIFSSVCYWTLGLHPEVARFGYFSA^lLLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSI 539 

Qy 541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 

: I : I : I I I I : I I I I I I I I I I I : I I I I I I I I I I I II I I I I I I I I I I I II I : : I I 
Db 540 AGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPM 599 

Qy 601 CAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVIFKVRDYLISR 652 

II I I I : I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I : I I : I I : I I I I 
Db 600 CAFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 



RESULT 14 
ABP52128 

ID ABP52128 standard; protein; 649 AA. 
XX 

AC ABP52128; 
XX 

DT 10-OCT-2002 (first entry) 
XX 

DE Homo sapiens ABC transporter ABCG5 protein SEQ ID NO: 80. 
XX 

KW ATP-binding cassette transporter; ABC transporter; modulation; D loop; 

KW cancer; bacterial infection; fungal infection; protozoal infection; 

KW antibacterial; fungicide; protozoacide . 
XX 

OS Homo sapiens. 
XX 

PN EP1217066-A1. 
XX 

PD. 26-JUN-2002. 
XX 

PF 21-DEC-2000; 2000EP-00870316. 
XX 

PR 21-DEC-2000; 2000EP-00870316 . 
XX 

PA (UYGE-) UNIV GENT. 
XX 

DR WPI; 2002-550404/59. 
XX 

PT Modulating activity of ATP-binding cassette (ABC) transporters by 

PT influencing dimerization of nucleotide binding domains through use of D 

PT loop sequence of an ABC transporter, or its antisense peptide or peptide 

PT mimetic. 

XX 

PS Disclosure; Fig 3; 290pp; English. 
XX 

CC The present invention describes a method (Ml) for modulating the activity 



CC of ATP-binding cassette (ABC) transporters by influencing the 

CC dimerisation of the nucleotide binding domains comprises using: (a) a 

CC polypeptide (polyP) consisting of 5-50 amino acids comprising the D loop 

CC sequence of an ABC transporter (ABP52049 to ABP52091) ; (b) a polyP 

CC consisting of the D loop sequence of an ABC transporter; (c) a peptide 

CC mimetic or antisense peptide of (a) or (b) . ABC transporters have 

CC antibacterial , fungicide and protozoacide activities. (Ml) is useful for 

CC selectively modulating the activity of ABC transporters belonging to the 

CC group of multidrug transporter/P-glycoproteins . Bacterial, fungal or 

CC protozoal ABC transporters are involved in the infection of a mammal or 

CC in the induction of resistance to antibiotics or drugs in a mammal. (Ml) 

CC is useful for preventing, treating or alleviating diseases associated 

CC with functionality of an ABC transporter. ABP52092 to ABP52140 represent 

CC ABC transporter proteins given in the exemplification of the present 

CC invention 

XX 

SQ Sequence 649 AA; 

Query Match 80.8%; Score 2722.5; DB 5; Length 649; 

Best Local Similarity 79.9%; Pred. No. l.le-251; 

Matches 521; Conservative 64; Mismatches 64; Indels 3; Gaps 2; 

Qy 1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

I I : I 1:11:1 : I I I I I I I I I I I I I I : I I I I I I I : I I I I I : I I 

Db 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEP-HSLGILHASYSVSHRVRPWWDITS 59 

Qy 61 CQQKWDRQI LKDVS L YI ES GQIMCI LGS S GS GKTTLLDAI S GRLRRTGTLEGEVFVNGCE 12 0 

I : I : I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I : I M I I II I I I : I I I 
Db 60 CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRA 119 

Qy 121 LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRSSM 180 

I | I : I I I I I II I I I I I I I I I I I I I I I I I I I : I I : I ' ' I I I I I I I I I I I I I 
Db 120 LRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSH 17 9 

Qy 181 VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELA 240 

I I I :: I I : I : I I I I : I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I II I : I I III 
Db 180 VADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLF— PTTGLDCMTANQIWLLVELA 237 

Qy 241 RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

I I : I I I :: I I I I I I I I I I I I I I I I I I :: I I I : I I I I I III I I I : I I I I I I I I I I I I I 
Db 238 RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 297 

Qy 301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIER7VRYLKTLPMVPFKTK 360 

I | | M M I I I I I : I I II I I II I I I : I I : I : I I II I : I I I I :: I I I I I I I I I I M 
Db 298 FYMDLT S VDTQS KEREI ET S KRVQMI ES AYKKS AI CHKTLKNI ERMKHLKTLPMVP FKTK 357 

Qy 361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

I ||:| I II I I I I I I I I I I : I I I Ml I I : I I I I I II I I : I :: I II I I I I I : I I I 
Db 358 DSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDR 417 

Qy 421 VGL L YQ LVGAT P YT GMLNAVN L F PML RAVS DQ E S Q D G L YH KWQML LAYVLHVL P F S VI AT 480 

I I I I M I I I I I I I I I II II I I I I : I I I M I I II I I I I I 1111:111 I I I I I I I I : I I 
Db 418 VGL L YQ FVGAT P YTGMLNAVN L F P VL RAVS DQ E S Q D GL YQ KWQMMLAYALHVL P F SWAT 477 

Qy 481 VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 540 

: I I I I I I I I II II : I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I : I I I I I I 
Db 478 MIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSI 537 



541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 

: I : I : I I I I : I I I I I I I I I I I : I I I II I I I I I I I I I I I I I I I I I I I II I : : I I 
538 AGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEIL\A/NEFYGLNFTCGSSNVSVTTNPM 597 

601 CAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVIFKVRDYLISR 652 

II I I I : I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I : I I : I I : I I I I 
598 CAFTQGIQFIEKTCPGATSRFTMNFLILYSFI PALVILGIWFKIRDHLISR 649 



RESULT 15 
AAU96991 

ID AAU96991 standard; protein; 408 AA. 
XX 

AC AAU96991; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 mutant R408X protein sequence. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hyper sterolemia; Alzheimer ! s disease; 

KW mutant; mutein. 
XX 

OS Homo sapiens. 

OS Synthetic. 
XX 

FH Key Location/Qualifiers 

FT Misc-dif ference 408 

FT /note= "Wild-type protein truncated at this position" 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Claim 10; Page; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 



CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the human ABCG5 mutant R4 08X protein of the 

CC invention.. Note: This sequence is not shown in the specification but is 

CC derived from the wild-type human ABCG5 protein (AAU96984) given on pages 

CC 35-36 of the specification 

XX 

SQ Sequence 408 AA; 

Query Match 48.0%; Score 1618.5; DB 5; Length 408; 

Best Local Similarity 76.5%; Pred. No. 4.5e-146; 

Matches 313; Conservative 45; Mismatches 50; Indels 1; Gaps 1; 

Qy 1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTE7VRHSLGVLHVSYSVSNRVGPWWNIKS 60 

||:| |:| |: I : I I I I MM I I I I I I : I I I I I I I : I I I I I : I I 

D b 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEP-HSLGILHASYSVSHRVRPWWDITS 59 

Qy 61 CQQKWDRQI LKDVSLYI ESGQIMCI LGS SGSGKTTLLDAI SGRLRRTGTLEGEVFVNGCE 120 

| : I : I | I I I I I I I I I : I I I I I I I I I I I I I I I I I M I I I : I I I I I II I I I : I I I 
Db 60 CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRA 119 

Qy 121 LRRDQFQDCFSWLQSDVFLSSLTVRETLRYTAMLALCRSSADFTOKKVEAVMTELSLSH 180 

| | | : I I I I I I M I I I I I I I I I I I I I I I llhll: I : : I I I I I I I I I I I I I 
D b 12 0 LRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSH 179 

Qy 181 VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELA 240 

|||::||:|: | | | | : I I I I I I I I I I I I I I I I I I I : I II I I I I I I I I I I I I I : I I IN 

D b 18 0 VADRLI GNYSLGGI STGERRRVS I AAQLLQDPKVMLFDEPTTGLDCMTANQI VVLLVELA 239 

Qy 241 RRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

I I : I I I : : M I I M S I I I I I I I I I I I :: I I I : I I I I I Ml M I : I M I I I I I I II I I 
Db 240 RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 

Qy 301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 

M I I M I I I I I I : I I I I I I Mlll:l hlM I I I MINI :: I I I I I I I I I I I I 
Db 300 FYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTK 359 

Qy 361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQ 409 

I ||:| I M I I I I I I I I I I : I I I Ml ll:|llllllll:|::IM: 
Db 360 DSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVR 408 



Search completed: February 27, 2004, 06:44:17 
Job time : 49.5363 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 
Run on: 



Title: 

Perfect score: 
Sequence : 



February 27, 2004, 07:11:48 ; Search time 14.7734 Seconds 

(without alignments) 
2278.426 Million cell updates/sec 

US-09-989-981A-2 
3369 

1 MGELPFLSPEGARGPHINRG PALVI LGI VI FKVRDYLI S R 652 



389414 



Scoring table: BLOSUM62 

Gapop 10.0 , Gapext 0.5 

Searched: 389414 seqs, 51625971 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Database : Issued_Patents_AA: * 

1: /cgn2_6/ptodata/2/iaa/5A_COMB.pep:* 

2 : / cgn2_6/ptodata/2/iaa/ 5B_COMB . pep : * 

3 : / cgn2_6/ptodata/2/iaa/ 6A_COMB . pep : * 

4: /cgn2_6/ptodata/2/iaa/6B_COMB.pep:* 

5: /cgn2_6/ptodata/2/iaa/PCTUS_COMB.pep:* 

6: /cgn2_6/ptodata/2/iaa/backfilesl.pep: * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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ALIGNMENTS 



RESULT 1 
US-09-245-808-1 

; Sequence 1, Application US/09245808 
; Patent No. 6313277 
; GENERAL INFORMATION: 
; APPLICANT: Doyle, L. Austin 
APPLICANT: Abruzzo, Lynne V. 
; APPLICANT: Ross, Douglas D. 

; TITLE OF INVENTION: Breast Cancer Resistance Protein (BCRP) and DNA which 
; TITLE OF INVENTION: encodes it 
; FILE REFERENCE: Ross UMb conversion 
■ ; CURRENT APPLICATION NUMBER: US/09/245, 808 
; CURRENT FILING DATE: 1999-02-05 
; EARLIER APPLICATION NUMBER: 60/073763 
; E/\RLIER FILING DATE: 1998-02-05 
; NUMBER OF SEQ ID NOS : 7 
; SOFTWARE: Patentln Ver. 2.0 
; SEQ ID NO 1 



LENGTH: 655 
; TYPE: PRT 

; ORGANISM: Human MCF-7/AdrVp cells 
US-09-245-808-1 

Query Match 20.6%; Score 693.5; DB 4; Length 655; 

Best Local Similarity 29.0%; Pred. No. 3.9e-61; 

Matches 181; Conservative 142; Mismatches 246; Indels 55; Gaps 16; 

Qy 25 LEQGSVTGTEARHS LGVLHVSYSVSNRVGPWWNIKSCQQKWDRQILKDV 73 

: I I : I I I I : : | I : I | : : : : : | I : : 

Db 12 VSQGNTNGFPATASNDLKAFTEGAVLSFHNICYRVKLKSG FLPCRKPVEKEILSNI 67 

Qy 74 SLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYV 133 

: : : | : I I I : I I I : : I I I : : I : I I I : I : I I I I : M 
Db 68 NGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDAALINGAP-RPANFKCNSGYV 124 

Qy 134 LQSDVFLSSLTVRETLRYTAMLALCRSSADF-YNKKVEAVMTELSLSHV7U)QMIGSYNFG 192 

: I II : : I I I I I I : : : I I I : : I::: I: II I III :h 
Db 125 VQDDWMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKVGTQFIR 184 

Qy 193 GISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELARRDRIVIVTIHQ 252 

I : | I I I : I I I : I : I I : : I I I I I I I I I I I I :: I I I : : : : I : I : I I I 
Db 185 GVSGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQ 244 

Qy 253 PRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQ- 311 

|| :|: || : :| I hi I :| Ihl : II I : : I I I I : : I : : I : 
Db 245 PRYSIFKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTA 304 

Qy 312 - S RERE I ET YK RVQMLE CAFKE S D I YH K I LENIERARYLKT 351 

: || | :| :::| : :: : I: I I: : : 

Db 305 VALNRE-EDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITV 363 

Qy 352 LPMVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNN 411 

: : I : I : : I : I I : I I I I : : : : : : I I : : ■ I = 

Db 364 FKEISYTT S FCHQLRWVS KRS FKNLLGNPQAS I AQI I VT WLGLVI GAI YFGLKND 419 

Qy 412 T LKGAVQ D RVGL L YQ LVGAT P YT GMLNAVN L FPML RAVS DQESQDGLYH KWQML LAYVL - 47 0 

: : I : I I : I : I :: ::|| II : : : I II I M 

Db 420 ST— GIQNRAGVLFFLTTNQCFSS-VSAVELFWEKKLFIHEYISGYYRVSSYFLGKLLS 476 

Qy 471 HVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNI 530 

: I I : : : : : I I : : I : I I I I : I I : = : : : I I : : 
Db 477 DLLPMTMLPSIIFTCIVYFMLGLKPKADAFFVMMFTLM MVAYSAS SMALAIAAGQSV 533 

Qy 531 VNSIVALLSIS— GLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTC 588 

|: |::| ::| || : |: : I III: : I I Ml I II I 

Db 534 VSVATLLMTICFVFMMIFSGLLVNLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNF-C 592 

Qy 589 GGSNTSMLNHPMCAITQGVQFIEK 612 

I I : I I I ::: I 

Db 593 PGLNATGNNPCNYATCTGEEYLVK 616 



RESULT 2 
US-09-767-594-1 

; Sequence 1, Application US/09767594 



Patent No. 6521635 
GENERAL INFORMATION: 
APPLICANT: Bates, Susan 
APPLICANT: Robey f Robert 

APPLICANT: The Government of the United States of America 
APPLICANT: as represented by the Secretary of the 
APPLICANT: Department of Health and Human Services 

TITLE OF INVENTION: Inhibition of MXR Transport by Acridine Derivatives 
FILE REFERENCE: 015280-402100US 
CURRENT APPLICATION NUMBER: US/ 09/767 , 594 
CURRENT FILING DATE: 2001-01-22 
PRIOR APPLICATION NUMBER: US 60/177,410 
PRIOR FILING DATE: 2000-01-20 
NUMBER OF SEQ ID NOS : 2 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 1 
LENGTH: 655 
TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE : 

OTHER INFORMATION: human mitoxanthrone resistance (MXR) /BRCP/ABCP 
OTHER INFORMATION: protein 
US-09-767-594-1 

Query Match 20.5%; Score 689.5; DB 4; Length 655; 

Best Local Similarity 29.0%; Pred. No. 9.8e-61; 

Matches 181; Conservative 141; Mismatches 247; Indels 55; Gaps 16; 

Qy 25 LEQGSVTGTEARHS LGVTHVSYSVSNRVGPWWNIKSCQQKWDRQILKDV 73 

: ||: I I I I :: I I : I |:: :::M :: 

Db 12 VSQGNTNGFPATVSNDLKAFTEGAVLSFHNICYRVKLKSG FLPCRKPVEKEILSNI 67 

Qy 74 SLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYV 133 

: : : I : I I I : I I I : : I II : : I : I I I : I : I I I I : I I 
Db 68 NGIMKPG-LNAILGPTGGGKSSLLDVliAARKDPSG-LSGDVLINGAP-RPANFKCNSGYV 124 

Qy 134 LQ S D VFL S S LT VRET L RYT AMLAL C RS S AD F- YN KKVEAVMT E L S L S H VADQMI G S YN FG 192 

: I I I : : I I I M I : : : I I I : : I : : : hill I I I : h 
Db 125 VQDDVVMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIEELGLDKVADSKVGTQFIR 184 

Qy 193 GISSGERRRVSIAAQLLQDPKVT^LDEPTTGLDCMTANQIVLLLAELARRDRIVIVT 252 

I : I I I I : I I I : h I I : : I I I I I I I I I I I I : : II I : : : : I : I : I I I 

Db 185 GVSGGERKRTSIGMELITDPSILSLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQ 244 

Qy 253 PRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQ- 311 

II : | : I I : : I I I : I I : I I I : I : I I I : : I I I h : h : I : 

Db 245 PRYSIFKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTA 304 

Qy 312 -SREREIETYKRVQMLECAFKESDIYHKI LENIERARYLKT 351 

: | | | : I : : : I : : : : |: I h : : 

Db 305 VALNRE-EDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITV 363 

Qy 352 LPMVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNN 411 

: : I : I : : I : I h I I I I : : : : : : I I : : : h 

Db 364 FKEISYTT SFCHQLRWVSKRSFKNLLGNPQASIAQIIVTWLGLVIGAIYFGLKND 419 



Qy 



412 T LKGAVQ D RVG LL YQ LVGAT P YT GMLN AVNL F PML RAVS DQ E S QD GL YH KWQML LAYVL - 470 



: * I • l i • • • • i i it • • • i ii i * i 

Db 420 ST--GIQNRAGVLFFLTTNQCFSS-VSAVELFWEKKLFIHEYISGYYRVSSYFLGKLLS 476 

Qy 471 HVLPFSVIATVIFSSVCYWTLGLYPEVT^RFGYFSAALLAPHLIGEFLTLVLLGIVQNPNI 530 

: I I : : : : I I : : I : I I I I : I I : : : : : | I : : 

Db 477 DLL PMRML P S 1 1 FT C I VY FML GL K P KAD AFFVMMFT LM MVAY S AS SMALAI AAGQ S V 533 

Qy 531 VNSIVALLSIS — GLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTC 588 

|: |::| ::| II : |: : I I I I : :| I III I II I 

Db 534 VSVATLLMTICFVFMMIFSGLLVNLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNF-C 592 

Qy 589 GGSNTSMLNHPMCAITQGVQFIEK 612 

I I : I I I ::: I 

Db 593 PGLNATGNNPCNYATCTGEEYLVK 616 



RESULT 3 

US-09-614-912-140 

Sequence 140, Application US/09614912 
Patent No. 6677502 
GENERAL INFORMATION: 
APPLICANT: Allen, Steve 
APPLICANT: Rafalski, Antoni 
APPLICANT: Orozco, Buddy 
APPLICANT: Miao, Gou-Hau 
APPLICANT: Famodu, Omolayo O. 
APPLICANT: Lee, Jian Ming 
APPLICANT: Sakai, Hajime 
APPLICANT: Weng, Zude 
APPLICANT: Caimi, Perry G 
APPLICANT: Anderson, Shawn 

TITLE OF INVENTION: Plant Metabolism Genes 
FILE REFERENCE: BB1378 US NA 
CURRENT APPLICATION NUMBER: US/ 09/614 , 912 
CURRENT FILING DATE: 2000-07-12 
PRIOR APPLICATION NUMBER: 60/143,401 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/143,412 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/146,650 
PRIOR FILING DATE: 1999-07-30 
PRIOR APPLICATION NUMBER: 60/170,906 
PRIOR FILING DATE: 1999-12-15 
PRIOR APPLICATION NUMBER: 60/172,959 
PRIOR FILING DATE: 1999-12-21 
PRIOR APPLICATION NUMBER: 60/172,946 
PRIOR FILING DATE: 1999-12-21 
NUMBER OF SEQ ID NOS : 2 04 
SOFTWARE: Microsoft Office 97 
SEQ ID NO 140 
LENGTH: 1296 
TYPE: PRT 

ORGANISM: Oryza sativa 
US-09-614-912-140 



Query Match 

Best Local Similarity 



13.6%; 
26.8%; 



Score 457.5; DB 4; Length 1296; 
Pred. No. l.le-36; 



Matches 169; Conservative 100; Mismatches 244; Indels 117; Gaps 23; 



Qy 85 ILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYVLQSDVFLSSLT 144 

: I I | | | | | | | | :: I : : 111 MM I : I : I : I I : : I 

Db 9 LLGPPSSGKTTLLLALAGKLDPSLRRGGEVTYNGFELEEFVAQKTAAYISQTDVHVGEMT 68 

Qy 145 VRETLRYTAMLALCRS SADFYNKKVEAVMTE 175 

I : M I : : I I : I : I I 

Db 69 VKETLDFSAR CQGVGTKYDLLTELARREKEAGIRPEPEVDLFMKATSMEGVESSLQT 125 

Qy 176 LSLSHVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTA 229 

|| I :M III MM: :: ||: :|| MM! I 

Db 12 6 DYTLRILGLDICADTIVGDQMQRGISGGQKKRVTTGEMIVGPTKVXFMDEISTGLDSSTT 185 

Qy 230 NQIVLLLAELAR-RDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNC 288 

Ml | : : : : : : : : I I I I : I I I : I : I : : I : I I : I I I : I 
Db 186 FQIVKCLQQIVHLGEATILMSLLQPAPETFELFDDIILLSEGQIVYQGPREYVLEFFESC 245 

Qy 289 GYPCPEHSNPFDFYMDLTSVDTQSR — EREIETYKRVQMLECA — FKESDIYHKILENIE 344 

|:||| II : : I I I : : I : : : I I II : I I : 
Db 246 GFRCPERKGTADFLQEVTSKKDQEQYWADKHRPYRYISVSEFAQRFKR FHVGLQ 299 

Qy 34 5 RARYLKTLPMVPF-KTKDPPG — MFGKLGVLLRRVTRN LMRN KQAVIMRLVQ 393 

I : Mill: Mil : : : M I : M 

Db 300 LENHLSVPFDKTRSHQAALVFSKQSVSTTELLKAS FAKEWLLI KRNS FVYI FKTIQ 355 

Qy 394 NLIMGLFLIFYLLRVQNNTLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQ- 452 

:|: I 111:1 : I I M M I : : : M I I I : 

Db 356 LII VALVASTVFLRTQMHTRN — LDD — GFVY — IGALLFSLIVNMFNGFAELSLTITRL 409 

Qy 453 E S QDGL- YHKWQMLLAYVLHVL P FS VI AT VI FS S VC YWT LGL YP EVARFG YFS AAL 507 

: :| I I I I I: MUM :::: I IMM II II I I 
Db 410 PVFFKHRDLLFYPAWI FTLPNVI LRI P FS 1 1 ES I VWVI VT YYT I GFAPEADRF — FKQLL 467 

Qy 508 LAPHLIGEFLTLVLLG IVQNPNIVNSIVALLSISGLLIGSGFIRNIQ 554 

| || : I I : ::: : I I : I I 

Db 468 LV FLIQQMAGGLFRATAGLCRSMIIAQTGGALALLIFFVLGGFLLPKAFIPK — 519 

Qy 555 EMPIPLKILGYFTFQ-KYCCEILWNEFYGLNFTCGGSNTSMLNHPMCAITQGVQFIEKT 613 

III: I I I I II I : I : I : : MM 

Db 520 WWIWGYWVSPLMYGYNALAVNEFYSPRW MNKFVLDNNGVP KRLGI ALME — 568 

Qy 614 CPGATSRFTANFLI LYGFI PALVILGIVI F 643 

II M M I M I M 

Db 569 — GANIFTDKNWF WIGAAGLLGFTMF 592 



RESULT 4 

US-09-614-912-138 

Sequence 138, Application US/09614912 
Patent No. 6677502 
GENERAL INFORMATION: 
APPLICANT: Allen, Steve 
APPLICANT: Rafalski, Antoni 
APPLICANT: Orozco, Buddy 
APPLICANT: Miao, Gou-Hau 
APPLICANT: Famodu, Omolayo O. 



APPLICANT: Lee, Jian Ming 
APPLICANT: Sakai, Hajime 
APPLICANT: Weng, Zude 
APPLICANT: Caimi, Perry G 
APPLICANT: Anderson, Shawn 

TITLE OF INVENTION: Plant Metabolism Genes 
FILE REFERENCE: BB1378 US NA 
CURRENT APPLICATION NUMBER: US/09/ 614 , 912 
CURRENT FILING DATE: 2000-07-12 
PRIOR APPLICATION NUMBER: 60/143,401 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/143,412 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/146,650 
PRIOR FILING DATE: 1999-07-30 
PRIOR APPLICATION NUMBER: 60/170,906 
PRIOR FILING DATE: 1999-12-15 
PRIOR APPLICATION NUMBER: 60/172,959 
PRIOR FILING DATE: 1999-12-21 
PRIOR APPLICATION NUMBER: 60/172,946 
PRIOR FILING DATE: 1999-12-21 
NUMBER OF SEQ ID NOS : 2 04 
SOFTWARE: Microsoft Office 97 
SEQ ID NO 138 
LENGTH: 617 
TYPE: PRT 

ORGANISM: Zea mays 
US-09-614-912-138 



Query Match 12,5%; Score 419.5; DB 4; Length 617; 

Best Local Similarity 24.6%; Pred. No. 2.3e-33; 

Matches 152; Conservative 130; Mismatches 240; Indels 97; Gaps 



Qy 


66 


DR-QILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRD 

| | | : | : : I : I : : : 1 1 1 : 1 1 1 1 1 : 1 : : 1 1 : 1 : 1 1 : : : 1 : 
DRLQLLREVTGSFRPGVLTALMGVSGAGKTTLMDVLAGR-KTGGYIEGDIRIAGYPKNQA 


124 


Db 


34 


92 


Qy 


125 


Q FQ DC F S YVLQ S D VFL S S LT VRET L R YT AMLAL CRSSADFYNKKVEAVMTELSL 

I I I : I : : 1 1 1 1 : 1 1 : 1 1 1 : 1 : 1 : 1 1 : 1 
TFARISGYCEQNDIHSPQVTVRESLIYSAFLRLPGKIGDQEITDDIKMQFVDEVMELVEL 


178 


Db 


93 


152 


Qy 


179 


SHVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVXLIAE 

: : | : : | I : I : : 1 : 1 : : 1 1 : 1 : : 1 : : : 1 II 1 : 1 1 1 1 : : • 
DNLRDALVGLPGITGLSTEQRKRLTIAVELVANPSIIFMDEPTSGLDARAAAIVMRTVRN 


238 


Db 


153 


212 


Qy 


239 


LARRDRIVIVTIHQPRSELFQHFDKIAILTY-GELVFCG TPEEMLGFFNNC-GYP- 

11:11111 ::|: II:: :| |:::: 1 ::|: :| 1 1 
TVDTGRTWCTIHQPSIDIFESFDELLLLKRGGQVIYSGKLGRNSQKMVEYFEAIPGVPK 


291 


Db 


213 


272 


Qy 


292 


CPEHSNPFDFYMDLTSVDTQSR-EREIETYKRVQMLECAFKESDIY — HKI LENI ERARY 

: || : : : : : 1 1 1 : 1 : : 1 : : 1 1 : 1 : 1 : 1 1 : : : 
IKDKYNPATWMLEVSSVATEVRLKMDFAKY YETSDLYKQNKVLVN-QLSQP 


348 


Db 


273 


322 


Qy 


349 


LKTLPMVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFL — IFYLL 


406 


Db 


323 


: | | : I : 1 : 1 : : : 1 1 : : 1 1 1 1 : : 
EPGTSDLYFPTEYSQSTIGQFKACLWKQWLTYWRSPDYNLVRYSFTLLVALLLGSIFWRI 


382 



Qy 407 — RVQNNTLKGAVQDRVGLL YQLVGAT P YTGMLNAVNLFPML RAVSDQESQDGLYHK 4 61 

::: | | I :| :| I : |: I : I:: II :| hi 
Db 383 GTNMEDATTLGMV IGAMYT AVMFIGINNCSTVQPWSIERTVFYRERAAGMYSA 436 

Qy 462 WQMLLAYVLHVLPFSVIATVI FSSVCY WTLGLYPEVARFGYFSAALLAPHLIGE 515 

:| |: :|: : I :: : I II : III 
Db 437 MP YAIAQWI EI PYVFVQTT YYTLI VYAMMS FQWTAVKFFWFFFI S YFS 485 

Qy 516 FLTLVLLGIVQ NPN-IVNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKY 571 

II I : : : I I I I I I I : I I I : I I : I I : : : = 

Db 486 FLYFTYYGMMAVSISPNHEVASIFAAAFFSLFNLFSGFF IPRP-RIPGWWIWYYW 539 

Qy 572 CCEILWNEFYGLNFTCGGSNTSMLNHPMCAITQGVQFIEKTCPGATSRFTANFLILYGF 631 

I : I I I I I ::: I I I :: : : I 

Db 540 ICPLAWT — VYGLIVTQYGDLEDLISVP GESEQTI SYYVTHHF 580 

Qy 632 IPALVILGI 640 

Ml: : 

Db 581 GYHRDFLPVIAPVLVLFAV 599 



RESULT 5 

US-09-614-912-144 

Sequence 144, Application US/09614912 
Patent No. 6677502 
GENERAL INFORMATION: 
APPLICANT: Allen, Steve 
APPLICANT: Rafalski, Antoni 
APPLICANT: Orozco, Buddy 
APPLICANT: Miao, Gou-Hau 
APPLICANT: Famodu, Omolayo O. 
APPLICANT: Lee, Jian Ming 
APPLICANT: Sakai, Hajime 
APPLICANT: Weng, Zude 
APPLICANT: Caimi, Perry G 
APPLICANT: Anderson, Shawn 

TITLE OF INVENTION: Plant Metabolism Genes 
FILE REFERENCE: BB1378 US NA 
CURRENT APPLICATION NUMBER: US/09/614 , 912 
CURRENT FILING DATE: 2 000-07-12 
PRIOR APPLICATION NUMBER: 60/143,401 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/143,412 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/146,650 
PRIOR FILING DATE: 1999-07-30 
PRIOR APPLICATION NUMBER: 60/170,906 
PRIOR FILING DATE: 1999-12-15 
PRIOR APPLICATION NUMBER: 60/172,959 
PRIOR FILING DATE: 1999-12-21 
PRIOR APPLICATION NUMBER: 60/172,946 
PRIOR FILING DATE: 1999-12-21 
NUMBER OF SEQ ID NOS : 204 
SOFTWARE: Microsoft Office 97 
SEQ ID NO 144 
LENGTH: 539 
TYPE: PRT 



; ORGANISM: Triticum aestivum 
FEATURE : 

NAME/ KEY: UNSURE 
LOCATION: (272).. (273) 
US-09-614-912-144 



Query Match 9.8%; Score 330.5; DB 4; Length 539; 

Best Local Similarity 21.9%; Pred. No. 2e-24; 

Matches 111; Conservative 120; Mismatches 210; Indels 67; Gaps 



Qy 


108 


GTLEGEVFVNGCELRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRSSADFYNK 

| :|||: |:| ::: I 1 |:|: : 1 : 1 : 1 : : 1 1 1 : 1 : 
GYIEGEITVSGYPKKQETFARISGYCEQNDIHSPHWIYESLVFSAWLRL-PAEVDSERR 


167 


Db 


2 


60 


Qy 


168 


K — VEAVMTELSLSHVAI)QMIGSYNFGGISSGERRRVSIAAQLLQDPKvMy[LDEPTTGLD 

| :| :| : |: : ::| |:|: :|:|::|| :|: :| :: :|lll:|ll 
KMFIEEIMDLVELTSLRG7VLVGLPGVNGLSTEQRKRLTIAVELVANPSIIFMDEPTSGLD 


225 


Db 


61 


120 


Qy 


226 


CMTANQIVXLLAELARRDRIVIVTIHQPRSELFQHFDKIAILTY-GELVFCG TPEE 

I : : : 1 1 : 1 II 1 1 : : I : I I : : : : 1 1 : : 1 
ARAAAIVMRTVRNTVNTGRTWCTIHQPSIDIFEAFDELFLMKRGGEEIYVGPVGQNSAN 


280 


Db 


121 


180 


Qy 


281 


MLGFFNNC GYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRVQMLECAFKE 

:: :| II II : ::::| 1 : h : 1 ::: 
LIEYFEEIEGISKIKDGY NPATWMLEVSS SAQEEM LGIDFAE-VYRQ 


332 


Db 


181 


226 


Qy 


333 


SDIYHKILENIERARYLKTLPM VPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQ 

1 : : 1 : 1 : 1 1 1 : 1 1 : : 1 : =11 
SELYQRNKE LIKELSMPAPGSSDLNFPTQYSRSFVTQCLACLWKQXXSYWRNPS 


385 


Db 


227 


280 


Qy 


386 


AVI MRLVQN L IMG L FL I F YL L RVQNNT L KGAVQD R VGL L YQLVGAT P YT GMLNAVNL F PM 
. : I I : : : : 1 : : 1 : II : : 1 1 1 : 1 : : : 1 : 
YTAVRLLFTIVIALMFGTMFWDLGSKTRRS — QDLFNAMGSMYAAVLYIGVQNSGSVQPV 


445 


Db 


281 


338 


Qy 


446 


L RAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGY 


502 


Db 


339 


: 11:1 |:| : 1 |: :: :|: : | :| l|:| : 
VWERT VF YRERAAGMY S AF P YAFGQVAI E F P YVLVQAL I YGGLVY SMI G FEWT VAK FLW 


398 


Qy 


503 


FSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSISGLLIG SGFIRNIQEMPIP 

: : I | : : : | : I 1 1 1 : : 1 : : 1 1 : : : : 1 1 
YL FFMYFTML YFT FYGMMAVGLT PN ES I AAI I S SAFYNVWNLFS GYLI P RP KLP I - 


559 


Db 


399 


453 


Qy 


560 


LKILGYFTFQKYCCEI LWNEF 581 

:: : : | : 1 1 :: 1 
WWRWYSWICPVAWTLYGLVASQF 476 




Db 


454 





RESULT .6 

US-09-489-039A-9127 

; Sequence 9127, Application US/09489039A 

; Patent No. 6610836 

; GENERAL INFORMATION: 

; APPLICANT: Gary Breton et. al 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
KLEBSIELLA 

; TITLE OF INVENTION: PNEUMONIAE FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: 2709.2004001 



CURRENT APPLICATION NUMBER: US/09/489, 039A 
CURRENT FILING DATE: 2000-01-27 
PRIOR APPLICATION NUMBER: US 60/117,747 
PRIOR FILING DATE: 1999-01-29 
NUMBER OF SEQ ID NOS : 14342 
SEQ ID NO 9127 
LENGTH: 384 
TYPE: PRT 

ORGANISM: Klebsiella pneumoniae 
US-09-489-039A-9127 

Query Match 7.8%; Score 262.5; DB 4; Length 384; 

Best Local Similarity 26.3%; Pred. No. 8.9e-18; 

Matches 78; Conservative 58; Mismatches 118; Indels 43; Gaps 8; 

Q y 68 QILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQ 127 

| : | I : I I I I I I ::: I I I I I I I I I I I I : I : i I = : I : : I : 

Db 36 QVLNDISLDIPSGQMVALLGPSGSGKTTLLRIIAGLEHQT SGHIRFHGTDVSRMHAR 92 

Qy 128 D-CFSYVLQSDVFLSSLTVRETLRY — TAMLALCRSSADFYNKKVEAVMTELSLSHVADQ 184 

| : | | :|| : : : I : I :l || :: : |:|:||: 

D b 93 DRKVGFVFQHYALFRHMTVFDNIAFGLTVLPRRERPNAAAIKAKVT^ 152 

Qy 185 MIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVIjLL 244 

:| |:::||::| i :|::::IMI M :: I :l : 

D b 153 YPAQ LSGGQKQRVALARALAVEPQILLLDEPFGALDAQVRKELRRWLRQLHEELK 207 

Qy 245 IVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPE EMLGFFNN 287 

| : : | : | : : : : : I : II I : I I 

D b 208 FTSVFVTHDQEEAMEVADRvVWSQGNIEQADAPERVWREPSTRFVLEFMGEWRLQGVI 267 

Q y 288 CGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRVQMLECAFK 331 

II | : ||:: II I I ::: 11 = 11 = I 
Db 268 RGGQFHVGAHRWPLGY-TPAYQGPVDLFLRPWEVDI-SRRTSLDSPLPVQVLEASPK 322 



RESULT 7 

US-08-665-259-25 

Sequence 25, Application US/08665259 
Patent No. 6028173 
GENERAL INFORMATION: 

APPLICANT: Landes, Gregory M. 
APPLICANT: Burn, Timothy C. 
APPLICANT: Connors, Timothy D. 
APPLICANT: Dackowski, William R. 
APPLICANT: Van Raay, Terence J. 
APPLICANT: Klinger, Katherine W. 

TITLE OF INVENTION: NOVEL HUMAN CHROMOSOME 16 GENES, 

TITLE OF INVENTION: COMPOSITIONS, METHODS OF MAKING AND USING SAME 
NUMBER OF SEQUENCES : 73 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: GENZYME CORPORATION 
STREET: One Mountain Road 
CITY: Framingham 
STATE: Massachusetts 
COUNTRY: United States of America 
ZIP: 01701 



; COMPUTER READABLE FORM: 

; MEDIUM TYPE: Floppy disk 

; COMPUTER: IBM PC compatible 

; OPERATING SYSTEM: PC-DOS/MS-DOS 

; SOFTWARE: Patentln Release #1.0, Version #1.30 

CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/665,259 
FILING DATE: 17-JUN-1996 
CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION: 
; NAME: Dugan, Deborah A. 

REGISTRATION NUMBER: 37,315 
REFERENCE/ DOCKET NUMBER: IG5-9.1 
TELECOMMUNICATION INFORMATION: 
; TELEPHONE: (508) 872-8400 

TELEFAX: (508) 872-5415 
; INFORMATION FOR SEQ ID NO: 25: 
; SEQUENCE CHARACTERISTICS: 
; LENGTH: 1684 amino acids 

; TYPE: amino acid 

TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-08-665-259-25 



Query Match 7.7%; Score 259.5; DB 3; Length 1684; 

Best Local Similarity 30.1%; Pred. No. 2.1e-16; 

Matches 85; Conservative 48; Mismatches 102; Indels 47; Gaps 13; 

Qy 30 VTGTEARHSLGVLHVSYSVSNRVGPWWNIKSCQQKWDRQILKDVSLYIESGQIMCILGSS 89 

I I : : I I : I I: I I : : I : : I : I I I : I I : 

Db 507 VAGIKIKH LSKVFRVGNK DRAAVRDLNLNLYEGQITVLLGHN 548 

Qy 90 GSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRD — QFQDCFSYVLQSDVFLSSLTVRE 147 

I : I I I I I : : I I I : : : I I : : I I : I h MM 

Db 549 GAGKTTTLSMLTGLFPPT SGRAYISGYEISQDMVQIRKSLGLCPQHDILFDNLTVAE 605 



Qy 148 TLRYTAML-ALCRSSADFYNKKVEAVMTELSLSHVADQMIGSYNFGGISSGERRRVSIAA 206 

I : I I I I | | | | : : |: | :| I ||::|| 

Db 606 HLYFYAQLKGLSR QKCPEEVKQMLHIIGLEDKWNSRSRF — LSGGMRRKLSIGI 657 

Qy 207 QLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELARRDRIVIVTIH-QPRSELFQHFDKIA 265 

I : I I : : I I I I I : I : I :: I I I : : I I : :: I I : : I I : I I 

Db 658 ALIAGSKVLILDEPTSGMDAISRRAIWDLL-QRQKSDRTIVLTTHFMDEADLLG — DRIA 714 



Qy 2 66 I LT YGELVFCGT P EEMLGFFNNCGYPC PEHSNPFD 300 

1:11111: :: I I I MM 

Db 715 IMAKGELQCCGSSLFLKQKYG AGYHMTLVKEPHCNPED 752 



RESULT 8 

US-08-762-500-25 

; Sequence 25, Application US/08762500 
; Patent No. 6030806 
; GENERAL INFORMATION: 

APPLICANT: Landes, Gregory M. 

APPLICANT: Burn, Timothy C. 
; APPLICANT: Connors, Timothy D. 



APPLICANT: Dackowski, William R. 
APPLICANT: Van Raay, Terence J. 
APPLICANT: Klinger, Katherine W. 

TITLE OF INVENTION: NOVEL HUMAN CHROMOSOME 16 GENES, 

TITLE OF INVENTION: COMPOSITIONS, METHODS OF MAKING AND USING SAME 
NUMBER OF SEQUENCES: 8 3 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: GENZYME CORPORATION 
STREET: One Mountain Road 
CITY: Framingham 
STATE: Massachusetts 
COUNTRY: United States of America 
ZIP: 01701 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/762,500 
FILING DATE: 09-DEC-1996 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/665,259 
FILING DATE: 17-JUN-1996 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/US96/ 104 69 
FILING DATE: 17-JUN-1996 
ATTORNEY/AGENT INFORMATION: 
NAME: Dugan, Deborah A. 
REGISTRATION NUMBER: 37,315 
REFERENCE/ DOCKET NUMBER: IG5-9.3 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (508) 872-8400 
TELEFAX: (508) 872-5415 
INFORMATION FOR SEQ ID NO: 25: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 1684 amino acids 
TYPE: amino acid 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-08-762-500-25 

Query Match 7.7%; Score 259.5; DB 3; Length 1684; 

Best Local Similarity 30.1%; Pred. No. 2.1e-16; 

Matches 85; Conservative 48; Mismatches 102; Indels 47; Gaps 13; 

Qy 30 VTGTEARHSLGVLHVS YSVSNRVGPWWNI KSCQQKWDRQI LKDVSLYI ESGQIMCI LGS S 8 9 

I I : : I I : I I: I I : : I : : I : I I I : I I : 

D b 507 VAGIKIKH LSKVFRVGNK DRAAVRDLNLNLYEGQITVLLGHN 54 8 

Qy 90 GSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRD — QFQDCFSYVLQSDVFLSSLTVRE 147 

| : | M I I : : I I I : : : I I : : I I : M : : M I 

D b 549 GAGKTTTLSMLTGLFPPT SGRAYISGYEISQDMVQIRKSLGLCPQHDILFDNLTVAE 605 

Q y 148 TLRYTAML-ALCRS SADFYNKKVEAVMTELSLSHVADQMI GS YNFGGI S SGERRRVS I AA 206 

| : | I M I M I : : I : I : M I I :: I I 



Db 606 HLYFYAQLKGLSR QKCPEEVKQMLHIIGLEDKWNSRSRF—LSGGMRRKLSIGI 657 



Qy 207 QLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELARRDRIVIVTIH-QPRSELFQHFDKIA 265 

I: I I I 1:1:1 :: I II : : II :: = l I ::l Ml 

Db 658 ALIAGSKVLILDEPTSGMDAISRRAIWDLL-QRQKSDRTIVLTTHFMDEADLLG — DRIA 714 

Qy 266 ILTYGELVFCGTP EEMLGFFNNCGYPC PEHSNPFD 300 

I: I I I I I: :: I II I I I I 

Db 715 IMAKGELQCCGS S.LFLKQKYG AGYHMTLVKEPHCNPED 752 



RESULT 9 

US-08-762-500-75 

Sequence 75, Application US/08762500 
Patent No. 6030806 
GENERAL INFORMATION: 

APPLICANT: Landes, Gregory M. 
APPLICANT: Burn, Timothy C. 
APPLICANT: Connors, Timothy D. 
APPLICANT: Dackowski, William R. 
APPLICANT: Van Raay, Terence J. 
APPLICANT: Klinger, Katherine W. 

TITLE OF INVENTION: NOVEL HUMAN CHROMOSOME 16 GENES, 

TITLE OF INVENTION: COMPOSITIONS, METHODS OF MAKING AND USING SAME 
NUMBER OF SEQUENCES : 83 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: GENZYME CORPORATION 
STREET: One Mountain Road 
CITY: Framingham 
STATE: Massachusetts 
COUNTRY: United States of America 
ZIP: 01701 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/ 08/762 , 500 
FILING DATE: 09-DEC-1996 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/665,259 
FILING DATE: 17-JUN-1996 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/US96/104 69 
FILING DATE: 17-JUN-1996 
ATTORNEY/AGENT INFORMATION: 
NAME: Dugan, Deborah A. 
REGISTRATION NUMBER: 37,315 
REFERENCE/ DOCKET NUMBER: IG5-9.3 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (508) 872-8400 
TELEFAX: (508) 872-5415 
INFORMATION FOR SEQ ID NO: 75: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 1704 amino acids 



TYPE: amino acid 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-08-762-500-75 

Query Match 7.7%; Score 259.5; DB 3; Length 1704; 

Best Local Similarity 30.1%; Pred. No. 2.2e-16; 

Matches 85; Conservative 48; Mismatches 102; Indels 47; Gaps 13; 

VTGTEARHSLGVLHVSYSVSNRVGPWWNIKSCQQKWDRQILKDVSLYIESGQIMCILGSS 89 

I I : : I I : I I: I I : : I : : I : I I I : I I = 

VAGIKIKH LSKVFRVGNK DRAAVRDLNLNLYEGQITVLLGHN 568 

GSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRD — QFQDCFSYVLQSDVFLSSLTVRE 147 
|:||||| ::| I I :::| |: :| I : I h :lll I 



| | I I III I : : I : I : I I I I :: I I 

fAQLKGLSR QKCPEEVKQMLHIIGLEDKWNSRSRF — LSGGMRRKLSIGI 677 



Ov 


30 


Db 


527 


Qy 


90 


Db 


569 


Qy 


148 


Db 


626 


Qy 


207 


Db 


678 


Qy 


266 


Db 


735 



I I : : I I I I I : I : I : : I I I : : I I : : : I I : : I I : I I 



I LTYGELVFCGTP EEMLGFFNNCGYPC PEHSNPFD 300 

I : I I I I I : :: I II I I I I 

IMAKGELOCCGSSLFLKQKYG AGYHMTLVKEPHCNPED 772 



RESULT 10 

US-09-543-681A-5411 

; Sequence 5411, Application US/09543681A 

; Patent No. 6605709 

; GENERAL INFORMATION: 

; APPLICANT: GARY BRETON 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO PROTEUS 
MIRABILIS FOR 

; TITLE OF INVENTION: DIAGNOSTICS AND THERAPEUTICS 

; FILE REFERENCE: 27 09.1002-001 

; CURRENT APPLICATION NUMBER: US/09/543, 681A 

; CURRENT FILING DATE: 2000-04-05 

; PRIOR APPLICATION NUMBER: US 60/128,706 

; PRIOR FILING DATE: 1999-04-09 

; NUMBER OF SEQ ID NOS : 834 4 

; SEQ ID NO 5411 

; LENGTH: 653 

TYPE: PRT 
; ORGANISM: Proteus mirabilis 
US-09-543-681A-5411 

Query Match 7.3%; Score 247; DB 4; Length 653; 

Best Local Similarity 21.8%; Pred. No. 8.1e-16; 

Matches 95; Conservative 89; Mismatches 136; Indels 116; Gaps 18; 

Qy 66 DRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNG CELR 122 

| :| : | I I :|::: |: I : I I I I I : II : : I I I : : M II :: 



Db 



26 DTWLDQISLTINAGEMVAIIGASGSGKSTLMN-ILGCLDKPSS — GEYKVAGQCVADME 82 



Qy 123 RDQF QDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRSSADFYNKKVEAVMTELSL 178 

|| :: | :: | :: | | : : I : I I : :: I :: I I I 

Db 83 SDQLAALRREHFGFIFQRYHLMAHLTAEQNVEIPAIYA — GKSTEQRKERARALLTRLGL 140 

Qy 179 SHVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAE 238 

I : : : I : I I : : : I I I I I I : : I : : I I I I II : : : : : I : 
Db 141 AERI — HYRPSQLSGGQQQRVSIARALMNGGEVILADEPTGALDSQSGKEVMAILKQ 195 

Qy 239 LARRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNP 298 

I : : I I : I I : I I : I : I : : : II I I I 

Db 196 LNQQGHTVIIVTHDPL--IAQQADRIIEIKDGQIISDN NN HHSAP 238 

Qy 299 FDFYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFK 358 

| : : | : : : I ::: 

Db 239 VKKVP PAI QTAS YFHQVI 256 

Qy 359 TKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQ 418 

|: | I :: I I :: :: :|:|: : :: : I : 

Db 257 GRFTQALNMAWRAMVVNKI RTLLTML-GI 1 1 GIASWTI I VIGDAAK 302 

Qy 419 D RVGLL YQ LVGAT P YT GMLNAVN L F PML RAVS DQ ESQDGLYHKWQMLLAYV 469 

Ml : : I I I : : : : I I I : | | : : I : 

Db 303 DRVLADIKAIGA NTIDIYPGKELGSDSPEDKQSLTIQDVDALKQQ SYI 350 

Qy 470 LHVLPFSVIATVIFSS 4 85 

II : I I I 

Db 351 QSVTP QIYFSS 361 



RESULT 11 

US-09-543-681A-8215 

; Sequence 8215, Application US/09543681A 

; Patent No. 6605709 

; GENERAL INFORMATION: 

; APPLICANT: GARY BRETON 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO PROTEUS 
MIRABILIS FOR 

; TITLE OF INVENTION: DIAGNOSTICS AND THERAPEUTICS 

; FILE REFERENCE: 2709.1002-001 

; CURRENT APPLICATION NUMBER: US/09/543, 681A 

; CURRENT FILING DATE: 2000-04-05 

; PRIOR APPLICATION NUMBER: US 60/128,706 

; PRIOR FILING DATE: 1999-04-09 

; NUMBER OF SEQ ID NOS : 834 4 

; SEQ ID NO 8215 

; LENGTH: 210 

TYPE: PRT 
; ORGANISM: Proteus mirabilis 
US-09-543-681A-8215 

Query Match 7.3%; Score 246.5; DB 4; Length 210; 

Best Local Similarity 33.7%; Pred. No. 1.4e-16; 

Matches 69; Conservative 43; Mismatches 80; Indels 13; Gaps 4; 
Qy 69 ILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQD 128 



I I : I I I : : I I : I I I I I I I I I I I I : I I : I I I : I I : : I 

Db 14 ILTEVSLHLEQGCCLGISGSSGSGKTTLLNAIAGYTDYTGDI VLANQNMNKLPVWQR 70 

Qy 129 CFSYVLQSDVFLSSLTVRETLRYTAMLALCRSSADFYNKKVEAWTELSLSHVADQMIGS 188 

|: I I I I:: I I : I : | I : : : : : : I : I : 

Db 71 PCRYLNQRLYLFPFLTVKQNLWLAQYAAKQKRS KEKEIALLEQMGIAHLATRYPSQ 126 

Qy 189 YNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLI^ 248 

I I I I : : I I : : I I : I I : : : : I I I : I I I I : I : I : : : I : 
Db 127 ISGGEQQRVALARALI SQPKLLLMDEPFSSLDWETRYQLWELI I SLKKQQITMI I 181 

Qy 249 TIHQPRSELFQHFDKIAILTYGELV 273 

I : II I I II : I : I : : I 

Db 182 VTHEPR-ELQALADKTLLLSNGKIV 205 



RESULT 12 

US-09-543-681A-7638 

Sequence 7638, Application US/09543681A 
Patent No. 6605709 
GENERAL INFORMATION: 
APPLICANT: GARY BRETON 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO PROTEUS 
MIRABILIS FOR 

TITLE OF INVENTION: DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: 2709.1002-001 
CURRENT APPLICATION NUMBER: US/09/543, 681A 
CURRENT FILING DATE: 2000-04-05 
PRIOR APPLICATION NUMBER: US 60/128,706 
PRIOR FILING DATE: 1999-04-09 
NUMBER OF SEQ ID NOS: 8344 
SEQ ID NO 7638 
LENGTH: 373 
TYPE: PRT 

ORGANISM: Proteus mirabilis 
US-09-543-681A-7638 

Query Match 7.3%; Score 246; DB 4; Length 373; 

Best Local Similarity 26.4%; Pred. No. 4e-16; 

Matches 74; Conservative 62; Mismatches 120; Indels 24; Gaps 7; 

Qy 47 SVSNRVGPWWN IKSCQQKWDR-QILKDVSLYIESGQIMCILGSSGSGKTTLLDAIS 101 

: | | | | I : : I I : : I II : I : I I : : I : I I I I I I I II I I I : 

Db 1 NVSNNAKQGQNMSIEINHVTKYFDRTEVLHDVNLTVNSGEMMALLGPSGSGKTTLLRIIA 60 

Qy 102 GRLRRTGTLEGEVFVNGCELRRDQFQD-CFSYVLQSDVFLSSLTVRETLRY — TAMLALC 158 

| : | ||:: I :: I :: : I I : | | | : : I : 

Db 61 GLEHQT EGKICFAGQDVSRLHARERKVGFVFQHYALFRHMTVFENIAFGLTVLPRRE 117 

Qy 159 RSSADFYNKKVEAWTELSLSHVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLD 218 

| : :||| :: : I |:| : :| |:::||::| I :|::::|| 

Db 118 RPNKAAI DKKVTQLLEMIQLPHLAQRYPAQ LSGGQKQRVALARALAVEPQILLLD 172 

Qy 219 EPTTGLDCMTANQIVLLLAELARRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTP 278 

|| || : : III : I : : | : | : I I : I : : III 

Db 173 EPFGALDAKVRTELRSWLRELHSELKFTSVFVTHDQQEAMEVADRIVIMGNGKIEQVGTP 232 



Qy 279 EEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIE 318 

: : : : I I : | | : : : : : : 

Db 233 QQV WHTPESRFVLEFLGDVNHLQGEINGAQLQ 264 



RESULT 13 

US-09-134-001C-3832 

Sequence 3832, Application US/09134001C 
Patent No. 6380370 
GENERAL INFORMATION: 
APPLICANT: Lynn Doucette-Stamm et al 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
STAPHYLOCOCCUS 

TITLE OF INVENTION: EPIDERMIDIS FOR DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: GTC-007 

CURRENT APPLICATION NUMBER: US/09/134 , 001C 
CURRENT FILING DATE: 1998-08-13 
PRIOR APPLICATION NUMBER: US 60/064, 964 
PRIOR FILING DATE: 1997-11-08 
PRIOR APPLICATION NUMBER: US 60/055,779 
PRIOR FILING DATE: 1997-08-14 
NUMBER OF SEQ ID NOS : 5674 
SEQ ID NO 3832 
LENGTH: 242 
TYPE: PRT 

ORGANISM: Staphylococcus epidermidis 
US-09-134-001C-3832 

Query Match 7.2%; Score 243; DB 4; Length 242; 

Best Local Similarity 27.8%; Pred. No. 3.9e-16; 

Matches 74; Conservative 54; Mismatches 90; Indels 48; Gaps 12; 

PWWNIKSCQQKWD-RQILKDVSLYIESGQIMCILGSSGSGKTT LLDAISGRLRR 106 

| |||: : | : : : I : I : : I : I I : : : I : I I I I I I : I III I 



Qy 


54 


Db 


2 


Qy 


107 


Db 


57 


Qy 


161 


Db 


111 


Qy 


221 


Db 


166 


Qy 


276 


Db 


220 



iGEVFVNGCELRR DQFQDCFSYVLQS-DVFLSSLTVRETLRYTAMLALCRS 160 

I : I ||: I : I I : : : I : : III: 



: | : : : : | I I I I : I I : : : I I : M I : I I I : I I I 



TTGLDCMTANQIVLLLAELARRDRIVIVTIHQPRSELFQHF DKI AI LT YGELVFC 275 

| : | | : : : : : I I : : : I : : I I I : : I : I 

TSALDPEWGDVLKVMRQLANESMTMVIVTHE MNFAKEISDKWFMADGVWES 219 



II: : II 1:11 : I 

ITPQNI FEN PQHSRTENF 237 



RESULT 14 
US-09-672-810-2 

; Sequence 2, Application US/09672810 
; Patent No. 6617450 



; GENERAL INFORMATION: 

; APPLICANT: STOCKER, PENNY J. 

; APPLICANT: STEIMEL-CRESPI , DOROTHY T. 

; APPLICANT: CRESPI, CHARLES L. 

TITLE OF INVENTION: P-GLYCOPROTEINS AND USES THEREOF 
; FILE REFERENCE: G0307/7018 
; CURRENT APPLICATION NUMBER: US/09/672, 810 
; CURRENT FILING DATE: 2000-09-28 

PRIOR APPLICATION NUMBER: US 60/156,921 
; PRIOR FILING DATE: 1999-09-28 
; PRIOR APPLICATION NUMBER: US 60/158,818 
; PRIOR FILING DATE: 1999-10-12 
; NUMBER OF SEQ ID NOS : 18 

SOFTWARE: FastSEQ for Windows Version 3.0 
; SEQ ID NO 2 

LENGTH: 12 8 0 
; TYPE: PRT 

; ORGANISM: Macaca fascicularis 
US-09-672-810-2 



Query Match 7.2%; Score 242; DB 4; Length 1280; 

Best Local Similarity 21.9%; Pred. No. 8.1e-15; 

Matches 125; Conservative 92; Mismatches 181; Indels 174; Gaps 24; 

Qy 15 PHINRGSLSSLEQGSVTGT-EARHSLGVLHVSYSVSNRVGPWWNIKSCQQKWDRQILKDV 7 3 

I I : I I : :: I I I : : I I I I : I I I : 

Db 373 PSIDSYSKSGHKPDNIKGNLEFRN VHFSYPSRKEV KILKGL 413 

Qy 74 SLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQ FQDCF 130 

: I : : I I I : : : I : I I I I : I : : I I I I I : I : : I : : 

Db 414 NLKVQSGQTVALVGNSGCGKSTTVQLMQ RLYDPTEGMVSVDGQDIRTINVRFLREII 470 

Qy 131 SYVLQSDVFLSSLTVRETLRYTAMLALCRS SADFYNKKVEAVMTE LSLSHVAD 183 

I I I I : I : I : I I I :: I : I : I I 

Db 471 GWS QE P V- L FATT I AEN I RY GRED VTMDEI EKAVKEANAYDFIMKLPQKFD 521 



Qy 184 QMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLA-ELARR 242 

: : I ' : | | : : : | : : M I : : : I I : : : I I I I : I I I : : I : : I : I I : 
Db 522 TLVGERG-AQLSGGQKQRIAIARALVRNPKILLLDEATSALD— TESEAWQVALDKARK 578 

Qy 243 DRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFY 302 

I II I I :: I II | :| | :|:: I 
Db 579 GRTTIVIAH — RLSTVRNADVIAGFDDGVIVEKGNHDELM KEKGIY 622 

Qy 303 MDLTSVDTQSREREI ETYKRVQMLECAFKES DIYHKILENIERARYL 349 

I : : I II: I : : I I : : I : : : I I 

Db 623 FKLVTMQTAGNEIELENAADESKSEIDTLEMSSHDSGSSLIRKRSTRRSVRGSQGQDRKL 682 

Qy 350 KT LPMVPF KTKDP PGMFGKLGVLLRRVTRNLMRN 383 

I : I I I 1:1 I : I : : : II 

Db 683 STKEALDES I PPVS FWRIMKLNLTEWPYFWGVFCAI INGGLQPAFAVI FSKI I GI FTRN 742 



Qy 384 KQ AVI MRLVQN L I MGL FL I FYL 4 05 

I I II III: • I : I 

Db 743 DDAETKRQNSNLFSLLFLVLGIVSFITFFLQGFTFGKAGEILTKRLRYMVFRSMLRQDVS 802 

Qy 406 L RVQNNT — L KGAVQD RVGL L YQ LVGAT P YT GMLNAVN L F PMLRAVS 450 



I : I : : I I I : I : : : I : | I I : : : 
Db 803 WFDDPKNTTGALTTRIANDAAQVKGAIGSRLAIITQNI-ANLGTGIIIS 850 

Qy 451 DQESQDGLYHKWQMLLAYVLHVLPFSVIATVI 4 82 

I : I I : I : I :: I III: 
Db 851 LIYGWQLTL-LLLAIVPIIAIAGW 874 



RESULT 15 
US-09-672-810-4 

; Sequence 4, Application US/09672810 

; Patent No. 6617450 

; GENERAL INFORMATION: 

; APPLICANT: STOCKER, PENNY J. 

; APPLICANT: STEIMEL-CRESPI , DOROTHY T. 

; APPLICANT: CRESPI, CHARLES L. 

; TITLE OF INVENTION: P-GLYCOPROTEINS AND USES THEREOF 

; FILE REFERENCE: G0307/7018 

; CURRENT APPLICATION NUMBER: US/09/672, 810 

; CURRENT FILING DATE: 2000-09-28 

; PRIOR APPLICATION NUMBER: US 60/156,921 

; PRIOR FILING DATE: 1999-09-28 

; PRIOR APPLICATION NUMBER: US 60/158,818 

; PRIOR FILING DATE: 1999-10-12 

; NUMBER OF SEQ ID NOS : 18 

; SOFTWARE: FastSEQ for Windows Version 3.0 
; SEQ ID NO 4 

LENGTH: 1283 

TYPE: PRT 

ORGANISM: Macaca fascicularis 
US-09-672-810-4 

Query Match 7.2%; Score 242; DB 4; Length 1283; 

Best Local Similarity 21.9%; Pred. No. 8.1e-15; 

Matches 125; Conservative 92; Mismatches 181; Indels 174; Gaps 24; 

Qy 15 PHINRGSLSSLEQGSVTGT-EARHSLGVLHVSYSVSNRVGPWWNIKSCQQKWDRQILKDV 73 

I I : I I : : : I I I : : I I I I : I I I : 

Db 376 PSIDSYSKSGHKPDNIKGNLEFRN VHFSYPSRKEV KILKGL 416 

Qy 74 SLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQ FQDCF 130 

: I : : I I I : : : I : I I I I : I : : I I I I I : I : : I : : 

Db 417 NLKVQSGQTVALVGNSGCGKSTTVQLMQ RLYDPTEGMVSVDGQDIRTINVRFLREII 473 

Qy 131 S YVLQ S DVFL S S LTVRET LRYTAMLALCRS SAD FYNKKVEAVMT E LSLSHVAD 183 

I I I I : I : I : I I I : : I : I : I I 

Db 474 GWSQEPV-LFATTIAENIRY GREDVTMDEIEKAVKEANAYDFIMKLPQKFD 524 

Qy 184 QMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMT7\NQIVLLLA-ELARR 242 

: : I : I I : : : I : : I I I : : : I I : : : I I I I : I I I : : I : : I : I I : 

Db 525 TLVGERG-AQLSGGQKQRIAIARALVRNPKILLLDEATSALD— TESEAWQVALDKARK 581 

Qy 243 DRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFY 302 

I I I I I : : I I I | : | | : | : : I 
Db 582 GRTTIVIAH— RLSTVRNADVIAGFDDGVIVEKGNHDELM KEKGIY 625 



Qy 



303 MDLTSVDTQSREREI ETYKRVQMLECAFKES DIYHKILENIERARYL 349 



I :: I ||: |: : II : :| : : : I I 

Db 626 FKLATTMQTAGNEIELENAADESKSEIDTLEMSSHDSGSSLIRKRSTRRSVRGSQGQDRKL 685 

Qy 350 KT LPMVPF KTKDP PGMFGKLGVLLRRVTRNLMRN 383 

I :| I I hi I: |: :: I I 

Db 686 STKEALDESIPPVSFWRIMKLNLTEWPYFWGVFCAIINGGLQPAFAVIFSKIIGIFTRN 745 

Qy 384 KQAVIMRLVQNLIMGLFLI FYL 4 05 

I I II III: hi 
Db 74 6 DDAETKRQNSNLFSLLFLVLGIVSFITFFLQGFTFGKAGEILTKRLRYMVFRSMLRQDVS 8 05 

Qy 406 LRVQNNT — LKGAVQDRVGLLYQLVGAT P YT GMLNAVN L FPMLRAVS 450 

I : I : : | | | : | : : : | : | I I : : : 
Db 806 WFDDPKNTTGALTTRLANDAAQVKGAI GSRLAI ITQNI -ANLGTGI IIS 853 

Qy 451 DQESQDGLYHKWQMLLAYVLHVLPFSVIATVI 482 

I : I I : I : I : : I III: 
Db 854 LIYGWQLTL-LLLAIVPIIAIAGW 877 



Search completed: February 27, 2004, 07:20:12 
Job time : 16.7734 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 

Run on: February 27, 2004, 06:44:33 ; Search time 14.5272 Seconds 

(without alignments) 
4317.206 Million cell updates/sec 



Title: US-09-989-981A-2 
Perfect score: 3369 



Sequence: 1 MGELPFLSPEGARGPHINRG PALVI LGI VI FKVRDYLI S R 652 



Scoring table: BLOSUM62 

Gapop 10.0 , Gapext 0.5 

Searched: 283366 seqs, 96191526 residues 

Total number of hits satisfying chosen parameters: 283366 



Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



Database : PIRJ78:* 
1: pirl:* 
2: pir2:* 
3: pir3:* 
4: pir4:* 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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ALIGNMENTS 



RESULT 1 
JC7 8 60 

brain multidrug resistance protein, BMDP - pig 
C; Species: Sus scrofa domestica (domestic pig) 

C;Date: 18-Nov-2002 #sequence_revision 18-Nov-2002 #text_change 31-Mar-2003 
C; Accession: JC78 60 
R;Eisenblaetter, T . ; Galla, H.J. 

Biochem. Biophys . Res. Commun. 293, 1273-1278, 2002 

A;Title: A new multidrug resistance protein at the blood-brain barrier. 

A; Reference number: JC7860; MUID: 22050127 ; PMID : 12054514 

A;Accession: JC7860 

A;Molecule type: mRNA 

A;Residues: 1-656 <EIS> 

A;Cross-references : GB:AJ420927 

A; Experimental source: brain 

C; Comment: This protein, a new transport protein of the ATP-binding cassette 
(ABC) superfamily of transporters, expressed in porcine brain capillary 
endothelial cells, plays an importnat role in the exclusion of xenobiotics from 
the brain and participates in drug transport across the blood-brain barrier and 
therefore is considered as a efflux4 pump at the cerebral endothelium. 



C; Genetics : 
A; Gene : bmdp 



Query Match 20.3%; Score 685; DB 2; Length 656; 

Best Local Similarity 29.7%; Pred. No. 3.8e-43; 

Matches 187; Conservative 130; Mismatches 228; Indels 84; Gaps 19; 

Qy 31 TGTEARHSLGVLHVSY-SVSNRVGPWWNIKS CQQKWDRQILKDVSLYIESGQIMCI 85 

: I : I I : I: : II * I I I :: :::| I ::: :: I : I 

Db 24 SSNELKTSAGGAVLSFHDICYRV KVKSGFLFCRKTVEKEILTNINGIMKPG-LNAI 78 

Qy 86 LGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYVLQSDVFLSSLTV 145 

| | : | M : : I I I : : I | | |:| :|| I |: I I - I II : :||| 

Db 79 LGPTGGGKSSLLDVXAARKDPHG-LSGDVLINGAP-RPANFKCNSGYWQDDVVMGTLTV 136 

Qy 146 RETLRYTAMLALCRSSADF-YNKKV^AVMTELSLSHVADQMIGSYNFGGISSGERRRVSI 204 

I I I : : : I I I : : I : : : hill III : I : I : I I I I : I I I 
Db 137 RENLQFSAALRLPTTMTNHEKNERINMVIQELGLDKVADSKVGTQFIRGVSGGERKRTSI 196 

Qy 2 05 AAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELAERDRIVIVTIHQPRSELFQHFDKI 2 64 

I : I : I I :: I I I I I I I I I III ::IM :::: I =1 =11111 :|: II : 
Db 197 AMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQPRYSIFKLFDSL 256 

Qy 265 AILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQ S 312 

: I I I : I I 111:1:111 : : I I I I : : I : : I : : 
Db 257 TLIASGRLMFHGPAREALGYFASIGYNCEPYNNPADFFLDVINGDSSAWLSRADRDEGA 316 

Qy 313 RE RE I ET YKRVQML E — C AF KESDI YHKI LENIERAR 347 

: I I | ::: . I I |:| :| :: 

Db 317 QEPEEPPEKDT PLI DKLAAFYTNSS FFKDTKVELDQFSGGRKKKKS SVYKEVT YTTS FCH 376 

Qy 348 YLKTLPMVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFL — IFYL 405 

I : : II : | | : | | | : : : : : I : I I : III 

Db 377 QLRWIS RRS FKNLLGN PQAS VAQI I VT 1 1 LGLVI GAI FYD 416 

Qy 406 LRVQNNTLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQML 465 

I : I | : | : | | : | : | : : :: I I I : : : I II 

Db 417 LK NDPSG-IQNRAGVLFFLTTNQCFSS-VSAVELLWEKKLFIHEYISGYYRVSSYF 471 

Qy 466 LAYVL-HVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGI 524 

: I : I I :: :: | | : : I : I M I I I I : :: : : I I 

Db 472 FGKLLSDLLPMRMLPSIIFTCITYFLLGLKPAVGSFFIMMFTLM MVAYS AS SMALAI 528 

Qy 525 VQNPNIVNSIVALLSIS— GLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFY 582 

: : I : I : : I I : : I I I : I : : : I III: : I I III 

Db 529 AAGQSWSVATLLMTISFVFMMIFSGLLVNLKTWPWLSWLQYFSIPRYGFSALQYNEFL 588 

Qy 583 GLNFTCGGSNTSMLNHPMCAITQGVQFIE 611 

I I I I I I : I II I ::: I 

Db 589 GQNF- C PGLNVTTNNTCS FAI CTGAEYLE 616 



RESULT 2 
C84423 

probable ABC transporter [imported] - Arabidopsis thaliana 
C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 02-Feb-2001 #sequence_revision 02-Feb-2001 #text_change 02-Feb-2001 



C; Accession: C84423 

R;Lin, X.; Kaul, S.; Rounsley, S.D.; Shea, T.P.; Benito, M.I.; Town, CD.; 
Fujii, C.Y.; Mason, T.M.; Bowman, C.L.; Barnstead, M.E.; Feldblyum, T.V.; Buell, 
C.R.; Ketchum, K.A. ; Lee, J. J.; Ronning, CM. ; Koo, H.; Moffat, K.S.; Cronin, 
L.A. ; Shen, M. ; VanAken, S.E.; Umayam, L . ; Tallon, L.J.; Gill, J.E.; Adams, 
M.D.; Carrera, A. J.; Creasy, T.H.; Goodman, H.M. ; Somerville, C.R.; Copenhaver, 
G.P.; Preuss, D.; Nierman, W.C; White, 0. ; Eisen, J.A. ; Salzberg, S.L.; Fraser, 
CM. ; Venter, J.C 
Nature 402, 761-768, 1999 

A; Title: Sequence and analysis of chromosome 2 of the plant Arabidopsis 
thaliana . 

A;Reference number: A84420; MUID: 20083487; PMID: 10617197 
A; Accession: C84423 
A; Status: preliminary 
A;Molecule type: DNA 
A; Residues: 1-725 <STO> 

A;Cross-references: GB:AE002093; NID: g4262239 ; PIDN : AAD14532 . 1 ; GSPDB : GN00139 
C; Genetics : 
A;Gene: At2g01320 
A;Map position: 2 

Query Match 18.7%; Score 630; DB 2; Length 725; 

Best Local Similarity 30.2%; Pred. No. 5.6e-39; 

Matches 188; Conservative 104; Mismatches 244; Indels 86; Gaps 18; 

Qy 55 WWNIKSC QQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISG R 103 

| M :| I I : I I : I I : I::: I M I I II I I I I I : : M I 

Db 72 WRNI-TCSLSDKSSKSVRFLLKNVSGEAKPGRLLAIMGPSGSGKTTLLNVLAGQLSLSPR 130 

Qy 104 LRRT GT L EG EVFVN GC E L RRDQ FQD C FS YVLQ S DVFL S S LT VRET LRYT AMLALC R- S S A 162 

| :| M Ml :: :: I I I : I I II I I I I I : I I I Ml 

Db 131 LHLSGLLE VNGKPSSSKAYK — LAFVRQEDLFFSQLTVRETLSFAAELQLPEISSA 184 

Qy 163 DFYNKKVEAVMTELSLSHVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTT 222 

: :: | :: :| I II : I III ll::|:|:| :|: I I: Mill 

Db 185 EERDEYVNNLLLKLGLVSCADSCVGDAKVRGISGGEKKRLSLACELIASPSVIFADEPTT 244 

Qy 223 GLDCMTANQIVLLLAELARRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCG-TPEEM 281 

Ml I ::: I MM II M I I I I :: II I Ml I I M I M 
Db 245 GLDAFQAEKVMETLQKLAQDGHTVICSIHQPRGSVYAKFDDIVLLTEGTLVYAGPAGKEP 304 

Qy 282 LGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILE 341 

I : I I I : I I I I I I M Mill II : II I I I I = 
Db 305 LTYFGNFGFLCPEHVNPAEFLADLISVDYSSSETVYSSQKRVHALVDAFSQ 355 

Qy 342 NIERARYLKTLPMVPFKTKD PPGMFGKLGVLLRR VTRNLM 381 

| | | I: I : : MIM M: 

Db 356 RSSSVLYATPLSMKEETKNGMRPRRKAIVERTDGWWRQFFLLLKRAWMQASRDGP 410 

Qy 382 RNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDRVGLLYQLVGATPYTGMLNAVN 441 

|| | : : | | : I : : : : : || I M I I I : I 

Db 411 TNKVRARMSVASAVIFG— SVFWRM GKS QT S I QDRMGLLQVAAI NT AMAALT KT VG 464 

Qy 442 LFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFG 501 

:|| II: |: II ||: : :| : : I : I I I I :MII 

Db 465 VFPKERAIVDRERSKGSYSLGPYLLSKTIAEIPIGAAFPLMFGAVLYPMARLNPTLSRFG 524 



Qy 502 YFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSI-VALLSISGLLIGSGFIRNIQEMPIPL 560 

| : : | : : | : : : : I : : : : : I : I I I 

Db 525 KFCGI VTVES FAAS AMGLTVGAMVP STEAAMAVGP S LMTV — FI VFGGYYVNADNT P 1 1 F 582 

Qy 561 KILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPMCAITQGVQFIEKTCPG 616 

: : : : : : I : I I I I I I II : | | : | : I 

Db 583 RWIPRASLIRWAFQGLCINEFSGLKF — DHQNT FDVQTGEQALERLS FGGRRI 633 

Qy 617 ATSR FTANFLIL 628 

III :: I : I : I 

Db 634 RETIAAQSRILMFWYSATYLLL 655 



RESULT 3 
S77690 

probable membrane protein YOL075c - yeast (Saccharomyces cerevisiae) 
N;Alternate names: hypothetical protein 01125; hypothetical protein O1130; 
hypothetical protein YOL074c 
C; Species : Saccharomyces cerevisiae 

C;Date: 21-Apr-1997 #sequence_revision 09-May-1997 #text_change 19-Apr-2002 

C;Accession: S77690; S66767; S66768 

R;Alexandraki, D.; Katsoulou, C; Tzermia, M. 

submitted to the Protein Sequence Database, July 1996 

A; Reference number: S66756 

A;Accession: S77690 

A; Molecule type: DNA 

A; Residues: 1-1294 <ALE> 

A;Cross-references: EMBL: Z74816; MIPS:YOL075c 

A;Note: this is a revision to the sequence from reference S66756 
A;Accession: S66767 
A; Molecule type: DNA 

A;Residues: 1-179, 1 TTRTGVFLWKRED 1 <ALW> 

A;Cross-references : EMBL: Z74816 

A; Experimental source: strain S288C 

A;Note: this sequence has been revised in reference S77690 

A;Note: this was assumed to be protein YOL074c 

A/Accession: S66768 

A; Molecule type: DNA 

A; Residues: 200-1294 <ALF> 

A; Cross-references : EMBL: Z74817 

A; Experimental source: strain S288C 

A;Note: this sequence has been revised in reference S77690 

A;Note: this was assumed to be the complete sequence of protein YOL075c 

C; Genetics : 

A; Cross-references : SGD: S 00054 35 
A;Map position: 15L 
A;Note: YOL075c 

C;Superfamily: unassigned ATP-binding cassette proteins; ATP-binding cassette 
homology 

C; Keywords: ATP; nucleotide binding; P-loop; transmembrane protein 

F;45-263/Domain: ATP-binding cassette homology <ABC1> 

F; 62- 6 9 /Region : nucleotide-binding motif A (P-loop) 

F; 37 6- 3 92 /Domain : transmembrane #status predicted <TM1> 

F; 469-485/Domain: transmembrane #status predicted <TM2> 

F; 4 9 6- 5 12 /Domain: transmembrane #status predicted <TM3> 

F; 606-622/Domain: transmembrane #status predicted <TM4> 

F;710-916/Domain: ATP-binding cassette homology <ABC2> , 



F;727-734/Region: nucleotide-binding motif A (P-loop) 
F; 104 2- 105 8 /Domain : transmembrane #status predicted <TM5> 
F;1125-1141/Domain: transmembrane #status predicted <TM6> 
F; 1177-1193/Domain: transmembrane #status predicted <TM7> 
F; 1269-1285/Domain: transmembrane #status predicted <TM8> 

Query Match 18.6%; Score 62 8; DB 2; Length 1294; 

Best Local Similarity 29.6%; Pred. No. 1.7e-38; 

Matches 183; Conservative 123; Mismatches 237; Indels 76; Gaps 21; 

Qy 67 RQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRT GTLEGEVFVNGCELR 122 

: : M : I : * I I I : I I II II : : I I : I I I II : : I : I - 

D b 707 KEILQSVNAIFKPGMINAIMGPSGSGKSSLLNLISGRLKSSVFAKFDTSGSIMFNDIQVS 766 

Qy 123 RDQFQDCFSYVLQ-SDVFLSSLTWETLRYTAMLALCRSSADFYNKK^ 181 

|:: Ml I I | : : | M : I I I : I I I I : :: : :: III 

D b 767 ELMFKNVCSYVSQDDDHLLAALTVKETLKYAAALRLHHLTEAERMERTDNLIRSLGLKHC 826 

Qy 182 ADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVTiLLAEL^ 241 

: :||: III ||:|||:: I I I I I ::: I I I I I : I I I I: I : : I : I I 
Db 827 ENNIIGNEFVKGISGGEKRRVTMGVQLLNDPPILLLDEPTSGLDSFTSATILEILEKLCR 886 

Qy 242 -RDRIVI VTIHQPRSELFQHFDKIAI LT-YGELVFCGTPEEMLGFFNNCGYPCPEHSNPF 299 

: : : I : I I I I I I I I I I : I : : I I I I : I : I I : : I I I I I : I 
D b 887 EQGKTIIITIHQPRSELFKRFGNVLLLAKSGRTAFNGSPDEMIAYFTELGYNCPSFTNVA 946 

Qy 300 DFYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILEN IERARYLKT 351 

I I : : I I I I : I I : : I I : I I : : I : I : : : I I : : I : 

Db 947 DFFLDLISVNTQNEQNEISSRARVEKILSAWKAN MDNESLSPTPISEKQQYSQE 1000 

Qy 352 LPMVPFK— TKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQ 409 

: : I : I : : I | : : : : I : I : I : : I : 

Db 1001 S F FT E Y S E FVRK PAN LVLAY I VNVK RQ FT TT RRS FD S LMARI AQ I P GLGVI FAL FFAP VK 1060 

Qy 410 NNTLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYV 469 

: I :: : I : I I : I : I I I : : I I =1 I : I I I I : 

Db 1061 HNYT — SISNRLGLAQEST-ALYFVGMLGNLACYPTERDYFYEEYNDNVYGIAPFFLAYM 1117 

Qy 470 LHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLI— GEFLTLVLLGIVQ 52 6 

| | I : I : I : : : III I III: : : M I : : : 

Db 1118 TLELPLSALASVLYAVFTVLACGL-PRTA — GNFFATVYCSFIVTCCGERLGIMTNTFFE 1174 

Qy 527 NPN-IVNSIVALLSI SGLL-IGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNE 580 

I : I I I : I I I Ml: : I II I I I" 

Db 1175 RPGFWNCISIILSIGTQMSGLMSLG MSRVLKGFNYLNPVGYTSMIIINFA 1225 

Qy 581 FYG-LNFTC— GGSNTSMLNHPMCAITQGVQFIEKTCPGATSRFTANFLILYGFIP 633 

I I I II I I I : : M I : | : I I : 

Db 1226 FPGNLKLTCEDGGKNS DGTCEFANGH DVLVSYGLVRNTQK 1265 

Qy ' 634 — ALVI LGI VI FKVRDYLI 650 

: : : : I : : : : I 
Db 1266 YLGIIVCVAIIYRLIAFFI 12 84 



RESULT 4 
T47652 



ABC transporter-like protein - Arabidopsis thaliana 

N;Alternate names: protein T26I12.10 

C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 20-Apr-2000 #sequence_revision 20-Apr-2000 #text_change 19-May-2000 
C;Accession: T47652 

R;Monfort, A.; Casacuberta, E . ; Puigdomenech, P.; Mewes, H.W. ; Lemcke, K.; 

Mayer, K.F.X.; Quetier, F. ; Salanoubat, M. 

submitted to the Protein Sequence Database, February 2000 

A; Reference number: Z24471 

A;Accession: T47652 

A; Status : preliminary 

A;Molecule type: DNA 

A; Residues: 1-725 <MON> 

A; Cross-references : EMBL: AL132954 

A; Experimental source: cultivar Columbia; BAC clone T26I12 

C; Genetics : 

A; Map position: 3 

A;Note: T26I12.10 

C; Super family: Arabidopsis thaliana probable ATP-binding cassette protein 
F12L6.1; ATP-binding cassette homology 

Query Match 18.0%; Score 606.5; DB 2; Length 725; 

Best Local Similarity 28.9%; Pred. No. 3.2e-37; 

Matches 173; Conservative 122; Mismatches 226; Indels 77; Gaps 



Qy 


62 


QQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCEL 

: | : : I 1 1 1 I I : : 1 1 : 1 1 : 1 1 : 1 1 : 1 1 : : 1 1 : 1 : 1 1 1 s 1 1 : = 
RQNGVKTLLDDVSGEASDGDILAVXGASGAGKSTLIDA1AGRVAE-GSLRGSWLNGEKV 


121 


Db 


92 


150 


Qy 


122 


RRDQ FQDC FS - YVLQ S DVFL S S LT VRET LRYT AMLALCRS - SAD FYNKKVEAVMTELS L S 

: : 1 1 1 : 1 1 : 1 1 1 : 1 1 1 : : 1 1 1 1 : : 1 1 1 : : : 1 1 
LQSRLLKVI SAYVMQDDLLFPMLTVKETLMFASEFRLPRSLSKSKKMERVEALIDQLGLR 


179 


Db 


151 


210 


Qy 


180 


HVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAEL 

: 1 : : 1 1 1 : 1 1 1 1 1 1 1 1 1 : : 1 1 1 : 1 1 1 1 1 : 1 ! 1 1 : 1 1 : 
NAANTVIGDEGHRGVSGGERRRVSIGIDIIHDPIVLFLDEPTSGLDSTNAFMWQVLKRI 


239 


Db 


211 


270 


Qy 


240 ARRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPF 

| : | M : : | | | I : : : 1 : : 1 1 : 1 : 1 1 1 : 1 : 1 1 1 : : 1 1 1 1 1 
271 AQSGSIVIMSIHQPSARIVELLDRLIILSRGKSVFNGSPASLPGFFSDFGRPIPEKENIS 


299 


Db 


330 


Qy 


300 


DFYMDLTSVDTQSREREIE TYKRVQMLECAF 

: 1 :| 1 1 1 : 1 III: 
EFALDLV RELEGSNEGTKALVDFNEKWQQNKISLIQSAPQTNKLDQDRSLSL 


330 


Db 


331 


382 


Qy 


331 


KESDIYHKILENIERARYL KTLPMVPFKTKDPPGMFGKLGVLLRRVTRNLMRNK 


384 


Db 


383 


||: | :: | : : 1 : 1 :| : :l :l :l :| 
KEA INASVSRGKLVSGSSRSNPTSMETVSSYANPSLF-ETFILAKRYMKNWIRMP 


436 


Qy 


385 


QAVIMRLVQNLIMGLFL — I FYLLRVQNNTLKGAVQDRVGLLYQLVGATPYTGMLNAVNL 
: | | : : : 1 1 : : : I : : | : I I I : I : 1 : 1 1 : 1 : 1 : 
ELVGTRIATVMVTGCLLATVYWKL DHTPRGA-QERL-TLFAFWPTMFYCCLDNVPV 


442 


Db 


437 


491 


Qy 


443 


FPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGY 


502 


Db 


492 


| | : : | : I : : : : 1 II : : : : I I : : : 1 1 : 1 1 : 1 : 
F I Q E RY I FL RET T HN AYRT S S YVI S H S L VS L P Q L LAP S LVF S AI T FWT VGL S GGL E G FVF 


551 



Qy 503 FSAALLAPHLIGEFLTLVLLGIVQNPNIWS-IVALLSISGLLIGSGFIRNIQEMPIPLK 561 

: : I I : : hi III: :|:: :: |: III I :l 
Db 552 YCLLIYASFWSGSSWTFISGW — PNIMLCYMVSITYLAYCLLLSGFYVNRDRIPFYWT 609 

Qy 562 ILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPMCAITQGVQFIEKTCPGATS 619 

I : I I I : : : I I I : I : I I I : I I I 

Db 610 WFHYISILKYPYEAVLINEF— DDPSRCFVRGVQVFDSTLLGGVS 652 



RESULT 5 
S19421 

ATP-dependent permease ADP1 precursor - yeast ( Saccharomyces cerevisiae) 
N;Alternate names: protein YCROllc; protein YCR105 
C; Species: Saccharomyces cerevisiae 

C;Date: 31-Mar-1992 #sequence_revision 31-Mar-1992 #text_change 19-Jan-2001 
C;Accession: S19421; S40914 
R;Goffeau, A.; Purnelle, B.; Skala, J. 

submitted to the Protein Sequence Database, March 1992 

A;Reference number: S19420 

A; Accession: S 19421 

A; Molecule type: DNA 

A; Residues: 1-1049 <GOF> 

A;Cross-references: EMBL:X59720; NID: gl907116; PIDN : CAA42328 . 1; PID : gl907154 ; 

GSPDB:GN00003; MIPS:YCR011c 

R; Purnelle, B. ; Skala, J.; Goffeau, A. 

Yeast 7, 867-872, 1991 

A; Title: The product of the YCR105 gene located on the chromosome III from 
Saccharomyces cerevisiae presents homologies to ATP-dependent permeases. 
A; Reference number: S40914; MUID : 92160395; PMID: 1789009 
A; Accession: S4 0914 

A; Status: not compared with conceptual translation 

A; Molecule type: DNA 

A; Residues: 1-1049 <PUR> 

R;Skala, J.; Purnelle, B.; Goffeau, A. 

Yeast 8, 409-417, 1992 

A; Title: The complete sequence of a 10.8 kb segment distal of SUF2 on the right 

arm of chromosome III from Saccharomyces cerevisiae reveals seven open reading 

frames including the RVS161, ADPl and PGK genes. 

A; Reference number: S25353; MUID : 92327849 ; PMID: 1626432 

A; Contents: annotation 

C; Genetics : 

A; Gene: SGD:ADP1; MIPS: YCROllc 

A;Cross-references: SGD: S0000604 ; MIPS: YCROllc 
A;Map position: 3R 

C; Superfamily: ATP-dependent permease ADPl; ATP-binding cassette homology 
C;Keywords: ATP; glycoprotein; nucleotide binding; P-loop; transmembrane protein 
F; 1-2 5 /Domain : signal sequence #status predicted <SIG> 

F;26-1049/Product : ATP-dependent permease ADPl #status predicted <MAT> 

F;26-324/Domain: extracellular #status predicted <EXT> 

F; 325-34 1/ Domain : transmembrane #status predicted <TM1> 

F;406-607/Domain: ATP-binding cassette homology <ABC> 

F;423-430/Region: nucleotide-binding motif A (P-loop) 

F; 5 5 0-55 7 /Region : nucleotide-binding motif B 

F; 794-810/Domain: transmembrane ttstatus predicted <TM2> 

F; 829-845/Domain: transmembrane #status predicted <TM3> 

F; 878-8 94 /Domain : transmembrane #status predicted <TM4> 

F; 909-925/Domain: transmembrane #status predicted <TM5> 



F; 938-954/Domain: transmembrane #status predicted <TM6> 
F; 1025-1041/Domain : transmembrane #status predicted <TM7> 

F;50, 114, 165, 221/Binding site: carbohydrate (Asn) (covalent) #status predicted 
F;429/Binding site: ATP (Lys) #status predicted 

Query Match 17.9%; Score 602.5; DB 1; Length 1049; 

Best Local Similarity 26.5%; Pred. No. le-36; 

Matches 191; Conservative 130; Mismatches 227; Indels 173; Gaps 25; 

Qy 3 8 SLGVLHVSYSVSNRVGPWWNIKSCQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLL 97 

: I : : : I I I I I : : : I : : I : : I I I : i : I I I : I I I I I I 

Db 383 TLSFENITYSV PSINSDGVEE TVLNEISGIVKPGQILAIMGGSGAGKTTLL 433 

Qy 98 DAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLAL 157 

| :: : I : I I : I : I I I : I I : I I I I : I I I I I : : I : I I 

Db 434 D I LAMK- RKT GH VS G S I KVN G I SMD RK S F S K 1 1 G FVDQ D D FLL PT LT VFET VLN SAL L RL 492 

Qy 158 CRS-SADFYNKKVEAVMTELSLSHVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMM 216 

:: I : : I I : I I : : I :: I I : I I I I I : I I I I I I : I : I h 

Db 493 PKALSFEAKKARVYICVTjEELRIIDIKDRIIGNEFDRGISGGEKRRVSIACELVTSPLVLF 552 

Qy 217 LDEPTTGLDCMTANQIVXLLAELAR-RDRIVIVTIHQPRSELFQHFDKIAILTYGELVFC 275 

11111:111 || :: I |: :| ::::|||||| :| III: :|: l|:|: 
Db 553 LDEPTSGLDASNANNVIECLVRLSSDYNRTLVLSIHQPRSNIFYLFDKLVLLSKGEMVYS 612 

Qy 276 GTPEEMLGFFNNCGYPCPEHSNPFDFYMDLT SVD- 309 

I ::: I I II II:: I I: :|:| : i 
Db 613 GNAKKVSEFLRNEGYICPDNYNIADYLIDITFEAGPQGKRRRIRNISDLEAGTDTNDIDN 672 

Qy 310 TQS REREI ET YKRVQMLECA 32 9 

II I : : I 

Db 673 TIHQTTFTSSDGTTQREWAHLAAHRDEIRSLLRDEEDVEGTDGRRGATEIDLNTKLLHDK 732 

Qy 330 FKESDIYHKILENI ERARYLK-TLPMVPFKTKDPPGMFGKLGVLLRRVTRNL 380 

: I : I I : : : I I : I II : I : I : I I : I : 

Db 733 YKDSVYYAELSQEIEEVLSEGDEESNVLNGDLP TGQQSAGFLQQLSILNSRSFKNM 788 

Qy 381 MRN KQAVI MRLVQN L IMGLFL - - 1 F YL L RVQNNT L KGAVQD RVGL L YQ LV GATPYTG 435 

II : :: : ::: III ::| : :| : I |:|:|| : :: I :|| 
Db 789 YRNPKLLLGNYLLTILLSLFLGTLYYNV SNDISG-FQNRMGLFFFILTYFGFVTFTG 844 

Qy 436 MLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVL HVLPFSVIATVIFSSVCYWT 490 

: : | : | : : | : | I I I : I : I I : : : I : I 

Db 845 L S S FALERI I FI KERSNNYYS P LAYYISKIMSEWPLRWPPILLSLIVYPM 896 

Qy 491 LGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIV QNPNIVNSIVALLS ISGLL 544 

II : I : :| :| | :: :||: I :|: |:: II III 

Db 897 TGLNMKDNAF-FKCIGILILFNLGISLEILTIGIIFEDLNNSIILSVLVLLGSLLFSGLF 955 

Qy 545 IGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEF YGLNFTCGGSNTSM 595 

|:|| : | | | : I I I : : I I I I I I 
Db 956 INTKNITN VAFKYLKNFSVFYYAYESLLINEVKTLMLKERKYGLNI 1001 

Qy 596 LNHPMCAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGI VIFKVRDY 648 

I I I I I : I I : : : I I : I : I 

Db 1002 EVP GAT 1 LSTFGFWQNLVFDI KI LALFNWFLIMGY 1038 



649 L 649 



Db 



1039 L 1039 



RESULT 6 
B96573 

protein F12M16.17 [imported] - Arabidopsis thaliana 
C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 02-Mar-2001 #sequence_revision 02-Mar-2001 #text_change 23-Mar-2001 
C;Accession: B96573 

R;Theologis, A.; Ecker, J.R.; Palm, C.J.; Federspiel, N.A. ; Kaul, S.; White, 0.; 
Alonso, J.; Altaf, H. ; Araujo, R. ; Bowman, C.L.; Brooks, S.Y.; Buehler, E. ; 
Chan, A.; Chao, Q. ; Chen, H-; Cheuk, R.F.; Chin, C.W.; Chung, M.K.; Conn, L. ; 
Conway, A.B.; Conway, A.R.; Creasy, T.H.; Dewar, K.; Dunn, P.; Etgu, P.; 
Feldblyum, T.V. ; Feng, J.; Fong, B.; Fujii, C.Y.; Gill, J.E.; Goldsmith, A.D.; 
Haas, B.; Hansen, N.F.; Hughes, B.; Huizar, L. 
Nature 408, 816-820, 2000 

A;Authors: Hunter, J.L.; Jenkins, J.; Johnson-Hopson, C; Khan, S.; Khaykin, E. ; 
Kim, C.J.; Koo, H.L.; Kremenetskaia, I.; Kurtz, D.B.; Kwan, A.; Lam, B. ; Langin- 
Hooper, S.; Lee, A.; Lee, J.M. ; Lenz, C.A. ; Li, J.H.; Li, Y. ; Lin, X.; Liu, 
S.X.; Liu, Z.A. ; Luros, J.S.; Maiti, R. ; Marziali, A.; Militscher, J.; Miranda, 
M. ; Nguyen, M. ; Nierman, W.C.; Osborne, B.I.; Pai, G. ; Peterson, J.; Pham, P.K.; 
Rizzo, M. ; Rooney, T-; Rowley, D. ; Sakano, H. 

A;Authors: Salzberg, S.L.; Schwartz, J.R.; Shinn, P.; Southwick, A.M.; Sun, H-; 
Tallon, L.J.; Tambunga, G.; Toriumi, M.J.; Town, CD.; Utterback, T.; van Aken, 
S.; Vaysberg, M. ; Vysotskaia, V.S.; Walker, M. ; Wu, D.; Yu, G.; Fraser, CM.; 
Venter, J.C; Davis, R.W. 

A; Title: Sequence and analysis of chromosome 1 of the plant Arabidopsis. 

A; Reference number: A86141; MUID: 21016719 ; PMID : 11130712 

A;Accession: B96573 

A; Status: preliminary 

A;Molecule type: DNA 

A; Residues: 1-590 <STO> 

A;Cross-references: GB:AE005173; NID: g7769856; PIDN : AAF69534 . 1; GSPDB : GN00141 

C; Genetics : 

A; Gene: F12M16.17 

A;Map position: 1 

C;Superfamily: fruit fly white protein; ATP-binding cassette homology 

Query Match 17.7%; Score 597; DB 2; Length 590; 

Best Local Similarity 28.9%; Pred. No. 1.2e-36; 

Matches 187; Conservative 111; Mismatches 248; Indels 100; Gaps 18; 
Qy 32 GT EARH S L GVLHVS Y S VS N RVG P WWN I K S C QQKWDRQILKDVSLYIESGQIMCILGS 88 



Db 



t I • I • « I i i • i • i • * 1111)1 i • i ii 

12 GREISYRLETKNLSYRIGGNTPKFSNL — CGLLSEKEEKVILKDVSCDARSAEITAIAGP 69 



Qy 



89 SGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYVLQSDVFLSSLTVRET 148 



Db 



I I : I I I I I I ... I . . I * l*l ill • • • • -iii t i i • i i 

70 SGAGKTTLLEILAGKVSH-GKVSGQVLVNGRPMDGPEYRRVSGFVPQEDALFPFLTVQET 128 



Qy 



Db 



149 LRYTAMLALCRSSADFYNKKVEAVMTELSLSHVADQMI GS YNFGGI S SGERRRVS IAAQL 208 

| | : | : I I I I I ::: I I I I I I I II : I I I I I I I I I I I : I 

129 LTYSALLRLKTKRKD-AAAKVKRLIQELGLEHVADSRIGQGSRSGISGGERRRVSIGVEL 187 



Qy 



209 LQDPKVMMLDEPTTGLDCMTANQIVLLLAELA-RRDRIVIVTIHQPRSELFQHFDKIAIL 267 



: || I:::||||:||l :| |:| || :: :: : :::|IMI : : hi :l 
Db 188 VHDPNVI LI DEPTSGLDSASALQWTLLKDMTI KQGKTI VLTIHQPGFRI LEQI DRIVLL 247 

Qy 268 TYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMD LT S VDTQS REREI ET YKRV 323 

: I : I I : : I : I I : : : I I : III III I 

Db 248 SNGMWQNGSVYSLHQKIKFSGHQIPRRVNVLEYAIDIAGSLEPIRTQSC-REISCYGHS 306 

Qy 324 QMLECAF KESDIY-HKILENIERARYLKTLPMVPFKTKDPPGMFGKLGVLLRR 375 

: : : : | | : : : | | : : : I : I 

Db 307 KTWKSCYISAGGELHQSDSHSNSVLEEVQ ILGQR 340 

Qy 376 VTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDRVG LLYQLVGAT 431 

: I : I I I I : I I I I I III : I II : I I : : I 
Db 341 SCKNI FRTKQLFTTRALQAS IAGLI LGS I YLNVGNQKKEAKVL-RTGFFAFI LTFLLS ST 399 

Qy 432 P YT GMLN AVN L F PML RAVS DQESQDGLYH KWQML LAYVLH VL P F S VI AT VI FS S VC YWT L 491 

: : | | : : I : I : I I I : I I : I : : : I : : I I : 

Db 400 TEGLPIFLQDRRILMRETSRRAYRVLSYVLADTLIFIPFLLIISMLFATPVYWLV 454 

Qy 492 GLYPEVARFGYFS AALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSISGLL — 544 

II I: I III I:: : I II II h MM: 

Db 455 GLRRELDGFLYFSLVIWIVLLMSNSFVACFSALV PNF IMGTSVISGLMGS 504 

Qy 545 — I GSGFI RNIQEMPI PLKI LGYFT FQKYCCEI LWNEFYGLNFTCGGSNTSMLNHPMCA 602 

: I I : : I : : : I : II I I : : I I : I I 
Db 505 FFLFSGYFIAKDRIPVYWEFMHYLSLFKYPFECLMINEYRGDVFL 549 

Qy 603 ITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVIFKVRDY 648 

I I : : I I : II Mil II 

Db 550 KQQDLKESQKWSNLGIMASFIVGYRVLGFFILWYRCY 586 



RESULT 7 
E96742 

probable ABC transporter F17M19.11 [imported] - Arabidopsis thaliana 
C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 02-Mar-2001 #sequence_revision 02-Mar-2001 #text_change 23-Mar-2001 
C; Accession: E96742 

R;Theologis, A.; Ecker, J.R.; Palm, C.J.; Federspiel, N.A.; Kaul, S.; White, O.; 
Alonso, J.; Altaf, H.; Araujo, R. ; Bowman, C.L.; Brooks, S.Y.; Buehler, E.; 
Chan, A.; Chao, Q. ; Chen, H-; Cheuk, R.F.; Chin, CJ.; Chung, M.K.; Conn, L.; 
Conway, A.B.; Conway, A.R.; Creasy, T.H.; Dewar, K. ; Dunn, P.; Etgu, P.; 
Feldblyum, T.V.; Feng, J.; Fong, B-; Fujii, C.Y.; Gill, J.E.; Goldsmith, A.D.; 
Haas, B.; Hansen, N.F.; Hughes, B. ; Huizar, L. 
Nature 408, 816-820, 2000 

A; Authors: Hunter, J.L.; Jenkins, J.; Johnson-Hopson, C. ; Khan, S-; Khaykin, E.; 
Kim, C.J.; Koo, H.L.; Kremenetskaia, I.; Kurtz, D.B.; Kwan, A.; Lam, B.; Langin- 
Hooper, S . ; Lee, A.; Lee, J.M.; Lenz, C.A.; Li, J.H.; Li, Y. ; Lin, X.; Liu, 
S.X.; Liu, Z.A. ; Luros, J.S.; Maiti, R. ; Marziali, A.; Militscher, J.; Miranda, 
M. ; Nguyen, M. ; Nierman, W.C.; Osborne, B.I.; Pai, G. ; Peterson, J.; Pham, P.K.; 
Rizzo, M. ; Rooney, T . ; Rowley, D . ; Sakano, H. 

A;Authors: Salzberg, S.L.; Schwartz, J.R.; Shinn, P.; Southwick, A.M.; Sun, H . ; 
Tallon, L.J.; Tambunga, G. ; Toriumi, M.J.; Town, CD. ; Utterback, T.; van Aken, 
S.; Vaysberg, M. ; Vysotskaia, V.S.; Walker, M. ; Wu, D. ; Yu, G. ; Fraser, CM.; 
Venter, J.C.; Davis, R.W. p 
A;Title: Sequence and analysis of chromosome 1 of the plant Arabidopsis. 
A;Reference number: A86141; MUID: 21016719; PMID: 11130712 



A; Access ion: E96742 
A; Status: preliminary 
A; Molecule type: DNA 
A; Residues: 1-609 <STO> 

A;Cross-references: GB:AE005173; NID:g6978921; PIDN: AAF34313 . 1 ; GSPDB: GN00141 

C; Genetics : 

A; Gene: F17M19.11 

A; Map position: 1 

C; Super family: fruit fly white protein; ATP-binding cassette Homology 

Query Match 17.7%; Score 595; DB 2; Length 609; 

Best Local Similarity 29.6%; Pred. No. 1.8e-36; 

Matches 178; Conservative 121; Mismatches 235; Indels 68; Gaps 17; 

Qy 66 DRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQ 125 

:| || |: I I: I :| I I I I I I :| I I : I:: I I I : I I : : : I " : 
Db 27 ERTILSGVTGMISPGEFMAVLGPSGSGKSTLLNAVAGRLHGS-NLTGKILINDGKITKQT 85 

Qy 126 FQDCFSYVLQSDVFLSSLTVRETLRYTAML7VLCRS-SADFYNKWEAVMTELSLSHVADQ 184 

: :| I I: I I I I M I : I : I I I I : I : I = I = = I I M : 
Db 86 LKRT-GFVAQDDLLYPHLTVRETLVFVALLRLPRSLTRDVKLRAAESVISELGLTKCENT 144 

Qy 185 MIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELAR-RD 243 

: : I : I I I I I I : I I I I I : I I : I : : : I I I I I : I I I I : : I I I I I • 
Db 145 WGNTFIRGISGGERKRVSIAHELLINPSLLVLDEPTSGLDATAALRLVQTLAGLAHGKG 204 

Qy 244 RIVI VTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYM 303 

: I : : I I I I I : I I I I : : I : I : : I I : : : I : I : MM: 
Db 205 KTWTSIHQPSSRVFQMFDTVLLLSEGKCLFVGKGRDAMAYFESVGFSPAFPMNPADFLL 264 

Qy 304 DLTSVDTQS RERE IETYKRVQMLECAFKESDIYHKILENIERARYL 349 

|| : |: Ml : I Ml :: | :| II:: 
Db 265 DIANGVCQTDGVTEREKPNVRQTLVTAYDTLLAPQVKTCI EVSHFPQDN ARFV 317 

Qy 350 KTL PMVPFKT KD P P GMFGKLGVLLRRVTRN LMRNKQAVIMRLVQ NLIMGLFLI FYL 405 

|| | I : I : I I I : : I : : : : I : I : : : M 

Db 318 KTRVNGGGITTCIATWFSQLCILLHRLLKE-RRHESFDLLRI FQWAASILCGLMWWHSD 376 

Qy 406 LRVQNNTL KGAVQ DRVGLL YQLVGAT P YT GML NAVNLFPMLRAVSDQESQDGLYHKW 462 

| IIIMIM: : : I : I III II I I : : I I : I 
Db 377 YR DVHDRLGLLFFI S I FWGVLP S FNAVFTFPQERAI FTRERAS GMYTLS 425 

Qy 463 QMLLAYVLHVLPFSVIATVI FS SVCYWTLGLYPEVARFGYFSAALLAPHLI GEFLTLVLL 522 

: | : || I :: I : I I : I I : I : M I : I I I 
Db 426 SYFMAHVLGSLSMELVLPASFLTFTYWMVYLRPGIVPFLLTLSvliLLYVTiASQGLGLALG 485 

Qy 523 GIVQNPNIVNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFY 582 

: : : : | I : : : : I I : : : I : : I : M Ml - 
Db 4 86 AAIMDAKKAST I VTVTMLAFVLT GG Y YVN KV P S GMVWMKYVSTT FYC YRLLVAI Q Y- 541 

Qy 583 GLNFTCGGSNTSMLNHPMC AITQGVQFIEKTCPGATSRFTA— NFLILY 629 

II :| I | : | : | :| : I M : II : : 

Db 542 -GSGEEILRMLGCDSKGKQGASAATSAGCRFVEEEVIGDVGMWTSVGVLFLMFF 594 



QY 
Db 



630 GF 631 
I : 

595 GY 596 



RESULT 8 
FYFFW 

white protein - fruit fly (Drosophila melanogaster ) 
C; Species: Drosophila melanogaster 

C;Date: 31-Dec-1990 #sequence_revision 17-Feb-1995 #text_change 19-Jan-2001 
C;Accession: S08635; S07263; S10240 
R;Pepling, M. ; Mount, S.M. 
Nucleic Acids Res. 18, 1633, 1990 

A; Title: Sequence of a cDNA from the Drosophila melanogaster white gene. 
A;Reference number: S08635; MUID : 90221897 ; PMID:2109311 
A;Accession: S08635 
A;Molecule type: mRNA 
A;Residues: 1-687 <PEP> 

A;Cross-references: EMBL:X51749; NID:g8825; PIDN : CAA36038 . 1; PID:g8826 
R;O f Hare, K. ; Murphy, C; Levis, R. ; Rubin, G.M. 
J. Mol. Biol. 180, 437-455, 1984 

A;Title: DNA sequence of the white locus of Drosophila melanogaster. 
A;Reference number: S07263; MUID: 85134865; PMID:6084717 
A;Accession: S07263 
A;Molecule type: DNA 

A; Residues: 1-24, 1 LI FEI P YHCRVTAD 1 , 30- 

334, 1 ITLHLNSYPAWVPSVLPTTIRRTFTYRCWPLCPDGRSSPVIGSPRYG ' , 372-687 <0HA1> 

A; Cross-references : EMBL:X02974 

A; Experimental source: strain Canton S 

R;0'Hare, K. 

submitted to the EMBL Data Library, June 1985 
A/Reference number: S10240 
A; Accession: S10240 
A;Molecule type: DNA 

A;Residues: 1-24 , ' LI FEI P YHCRVTAD 30-687 <OHA2> 

A; Cross-references: EMBL:X02974; NID:gl0873; PIDN : CAA26716 . 1; PID:gl0874 

A; Experimental source: strain Canton S 

C; Genetics : 

A; Gene: white; w 

A;Cross-ref erences : FlyBase : FBgn0003996 
A;Introns: 24/3; 116/1; 334/2; 439/3; 483/3 

C;Superfamily: fruit fly white protein; ATP-binding cassette homology 

C; Keywords: ATP; glycoprotein; nucleotide binding; P-loop; transmembrane protei; 

F;113-317/Domain: ATP-binding cassette homology <ABC> 

F;130-137/Region: nucleotide-binding motif A (P-loop) 

F;261-265/Region: nucleotide-binding motif B 

F;67,93,472,554,651/Binding site: carbohydrate (Asn) (covalent) ftstatus 
predicted 

Query Match 17.5%; Score 589; DB 1; Length 687; 

Best Local Similarity 27.5%; Pred. No. 6e-36; 

Matches 200; Conservative 129; Mismatches 255; Indels 142; Gaps 25; 



QY 



11 GARGP HINRGSLSSLEQ GSVTG TEAR 36 



Db 



13 GSKHPSAEHLNNGDSGAASQSCINQGFGQAKNYGTLLPPSPPEDSGSGSGQLAENLTYAW 72 



Qy 



Db 



37 HSLGVLHVSYSVSNRVGPWWNI KSCQQKW DRQILKDVSLYIESGQIMCI 85 

| : : : : I : I I I : : : : I I : I I : : : : 

73 HNMDI FGAVNQPGSGWRQLVNRTRGLFCNERHIPAPRKHLLKNVCGVAYPGELLAV 128 



Qy 86 LGSSGSGKTTLLDAISGRLRR — TGTLEGEVFVNGCELRRDQFQDCFSYVLQSDVFLSSL 143 

: | | | | : | | M I I : I : : I : : I :ll : : I :ll I 1 = 1= M 

Db 129 MGSSGAGKTTLLNALAFRSPQGIQVSPSGMRLLNGQPVDAKEMQARCAYVQQDDLFIGSL 188 

Qy 144 TVRETLRYTAMLALCRS SADFYNK KVEAVMTELSLSHVADQMIG-SYNFGGISSGER 199 

| | | | : M : : I I : : I : I : I I I I I : I I I : I I I I 

Db 18 9 TAREHLI FQAMVRMPRHLT — YRQRVARVDQVI QELS LS KCQHT 1 1 GVPGRVKGLS GGER 24 6 

Qy 200 RRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLIJ^l^RDRIVIVTIHQPRSELFQ 259 

: | :: | :: I II :: : I I I I : II I I I : : I : I : I : : : : I I : I I I I I MM: 
Db 247 KRLAFASEALTDPPLLICDEPTSGLDSFTAHSWQVLKKLSQKGKTVILTIHQPSSELFE 306 

Qy 260 HFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIET 319 

MM:: | : I I II I : I I : I II : I I I M : : : I M I I : 

Db 307 LFDKILLMAEGRVAFLGTPSEAVDFFSWGAQCPTNYNPADFWQVLAV VPGREIES 363 

Qy 320 YKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTKDPP GMFGKLGV 371 

I : : II I : :: I : II I : I I : 

Db 364 RDRI AKI CDNFAI S KVARDMEQLLATKNLE KPLEQPENGYTYKATWFMQFRA 415 

Qy 372 L LRRVT RN LMRN KQAVI MRLVQN LI MGL FL I F YLLRVQNNT LKGAVQ DRVGL L YQLVGAT 431 

:| I :::: I :||:| :: : II : I I 11= I ' 

Db 416 VLWRSWLSVLKEPLLVKVRLIQTTMVAI-L^ 473 

Qy 432 PYTGMLNAWLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIATVIFSSVCYWTL 491 

: : : | : | I : I : : II I : M : : : I : : : I : 

Db 474 TFQNVFATINVFTSELPVFMREARSRLYRCDTYFLGKTIAELPLFLTVPLVFTAIAYPMI 533 

Qy 492 GLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLS 539 

|| || | | | : : | | : I I : I 

Db 534 GLRAGVLHF FNCLALVTLV— ANVSTSFGYLISCASSSTSMALSV 57 6 

Qy 540 ISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGL— NFTCGGSN 592 

I Mill :M II I I :: :l I MM:: : :| || 

Db 577 GPPVIIPFLLFGGFFLNSGSVPVYLKWLSYLSWFRYANEGLLINQWADVEPGEISCTSSN 636 

Qy 593 T SMLNHPMCAI TQGVQFI EKTCPGA TSRFTANFLILYGFIPALVILGIVIFKVR 646 

| III : I I : I I I :: I I I I I M I 

Db 637 T TCPSSGKVILETLNFSAADLPL-DYV-GLAIL-IVSFRVL 67 4 

Qy 647 DYLISR 652 

I I I 

Db 675 AYLALR 680 



RESULT 9 
T47650 

ABC transporter-like protein - Arabidopsis thaliana 

N; Alternate names: protein T15C9.110 

C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 20-Apr-2000 #sequence_revision 20-Apr-2000 #text_change 19-May-2000 
C; Accession: T47650 

R;Mewes, H.W.; Rudd, S.; Lemcke, K. ; Mayer, K.F.X. 
submitted to the Protein Sequence Database, April 2000 
A; Reference number: Z24470 
A;Accession: T47650 



A; Status : preliminary 

A; Molecule type: DNA 

A; Residues: 1-7 08 <MEW> 

A; Cross-references: EMBL : AL132970 

A; Experimental source: cultivar Columbia; BAC clone T15C9 

C; Genetics : 

A; Map position: 3 

A;Note: T15C9.110 

C; Super family: Arabidopsis thaliana probable ATP-binding cassette protein 
F12L6.1; ATP-binding cassette homology 

Query Match 17.4%; Score 587; DB 2; Length 708; 

Best Local Similarity 27.8%; Pred. No. 8.9e-36; 

Matches 159; Conservative 129; Mismatches 229; Indels 54; Gaps 15; 



Qy 


67 


RQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQF 

: : | | : : I : I : : 1 1 1 1 : 1 1 : 1 1 : 1 1 : : 1 1 : : | : | | : I I : : : : 
KTLLDDITGEARDGEILAVLGGSGAGKSTLIDALAGRVAE-DSLKGTVTLNGEKVLQSRL 


126 


Db 


89 


147 


Qv 


127 


QDC FS - YVLQ S DVFL S S LT VRET LRYT AMLALCRS SAD F YNKKVEAVMT ELSLSHVA 

1 11:1 I: llhlll : : III MM 1 I :| 1 : 1 
LKVI SAYVMQDDLLFPMLTVKETLMFASEFRLPRSLPK — SKKMERVETLIDQLGLRNAA 


182 


Db 


148 


205 


Qv 


183 


DQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVXLLAELARR 

I : I I I : I 1 1 1 1 1 1 1 1 :: 1 1 :: II 1 1 1 : 1 1 1 1 : 1 : 1 = 1 s 
DTVI GDEGHRGVS GGERRRVS I GI DI I HDP I LLFLDEPT SGLDSTNAFMWQVLKRI AQS 


242 


Db 


206 


265 


Qy 


243 


DRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFY 

:||::|||| : : |:: ||::|: II hi : II:: 1 1 II 1 :l 
GSWIMSIHQPSARIIGLLDRLIILSHGKSVFNGSPVSLPSFFSSFGRPIPEKENITEFA 


302 


Db 


266 


325 


Qy 


303 


MDLTS VDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYL- 

: I : I : : : : : 1 : : 1 1 : | : : | : : 
LDVIRELEGSSEGTRDLVEjtNhiKWyyjNy 1AKA1 iyb KV o jjJ\ti/\ ±/\/\d vokljJ\j_ivo 


349 


Db 


326 




Qy 


350 


KTLPMVPFKT KDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLI F 
GSSGANPISMETVSSYANPP — LAETFI LAKRYI KNWI RTPELI GMRI GTVMVTGLLLAT 


403 


Db 


381 


438 


Qy 


404 


YLL RVQNNT L KGAVQD RVGL L YQ LVGAT P YT GMLNAVN L F PMLRAVS DQ E S Q DGL YH KWQ 

1 : : I 1 : 1 1 1 : 1 : 1 : : 1 : : : : 1 1 : : 1 : 1 
VYWRL- DNT PRGA-QERMG- FFAFGMSTMFYCCADNI PVFIQERYI FLRETTHNAYRTS S 


463 


Db 


439 


495 


Qy 


464 


MLIAYV1,HVXPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLG 


523 


Db 


496 


:::: | || : :: |:: :||:|| : 1 h : 1 1 : : 1 
WISHALVSLPQLLALSIAF7\ATTFWTVGLSGGLESFFYYCLIIYAAFWSGSSIVTFISG 


555 


Qy 


524 


IVQNPNIVNS-IVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFY 

:: ||:: | :| : :| |: II 1 :|: 1 : II 1 :::IM 
LI — P^r\/MMSYMVTIAYLSYCLLLGGFYINRDRIPLYWIWFHYISLLKYPYEAVLINEF- 


582 


Db 


556 


612 


Qy 


583 


GLNFTCGGSNTSMLNHPMCAITQGVQFIEKT 613 

: 1 : 1 1 1 : 1 
DDPSRCFVKGVQVFDGT 629 




Db 


613 





RESULT 10 



G02068 

white homolog - human 

C; Species: Homo sapiens (man) 

C;Date: 21-Dec-1996 #sequence_revision 06-Jun-1997 #text_change 02-Feb-2001 
C; Accession: G02 068 

R;Croop, J.M. ; Tiller, G. ; Fletcher, J. A. ; Lux, M. ; Raab, E.; Goldenson, D.; 
Arciniegas, S.; Son, D. ; Wu, R. 

submitted to the EMBL Data Library, August 1995 
A; Reference number: H00769 
A; Accession: G02068 

A; Status: preliminary; translated from GB/EMBL/DDBJ 
A;Molecule type: mRNA 
A; Residues: 1-638 <CRO> 

A;Cross-references: EMBL:U34919; NID: gl314276; PIDN : AAC51098 . 1 ; PID:gl314277 
C; Genetics : 
A; Gene: white 

C;Superfamily: fruit fly white protein; ATP-binding cassette homology 
C; Keywords: ATP; nucleotide binding; P-loop 
F;61-253/Domain: ATP-binding cassette homology <ABC> 
F;78-85/Region: nucleotide-binding motif A (P-loop) 

Query Match 17.2%; Score 580; DB 2; Length 638; 

Best Local Similarity 28.1%; Pred. No. 2.6e-35; 

Matches' 164; Conservative 125; Mismatches 241; Indels 54; Gaps 15; 

Qy 23 SSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKSCQQKWDRQILKDVSLYIESGQI 82 

|||:: || : I I I I I I I I :: I : : I I : I II- 

Db 27 SSLPRRAAVNIEFR DLSYSVPE— GPWW RKKGYKTLLKGISGKFNSGEL 73 

Qy 83 MCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQ — FQDCFSYVLQSDVFL 140 

: |:| ||:||:||:: ::| I II ::l I :ll III h l::l I: I 

Db 74 VAI MGP S GAGK S T LMNI LAG- Y RET G-MKGAVL I N G — LPRDLRCFRKVSCYIMQDDMLL 12 9 

Qy 141 SSLTVRETLRYTAMLALCRS SAI)FYNKKVEAVMTELSLSHVADQMI GS YNFGGI S SGERR 200 

: I I I : : : I : : : I I I I : II : I I : I : 

Db 130 PHLTVQEAMMVSAHLKL-QEKDEGRREMVKEILTALGLLSCANTRTGS LSGGQRK 183 

Qy 201 RVSIAAQLLQDPKVMMLDEPTTGLDCMTANQI\TLLLAELARRDRIVIVTIHQPRSELFQH 260 

| : : I I : I : : I I I I I I I : I I I : I : I I : I I : I : I I I I I I : : I I : 
Db 184 RLAIALELVNNPPVMFFDEPTSGLDSASCFQWSLMKGLAQGGRSIICTIHQPSAKLFEL 243 

Qy 261 FDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETY 320 

| | : : : | : I : I : I : : : : I I I : I I I I I : : I : : • 

Db 244 FDQLYVLSQGQCVYRGKVCNLVPYLRDLGLNCPTYHNPADFVMEVASGEYGDQNSRLVRA 303 

Qy 321 KRVQMLECAFKES DIYHKILENIERARYLKTLPMVPFKTKDPPGMFG 367 

||: | ::|: | ::: : || | II II 

Db 304 VREGMCDSDHKRDLGGDAEVNPFLWHRPSEEVKQTKRLKGL RKDSSSMEGCHSF 357 

Qy 368 KLGVLL RRVT RN LMRNKQAVIMRL VQN L I MGL FL I F YL L RVQNNT LKGAVQ DRV 421 

: : I : I : : I I : : I : :: : M : I : I II 

Db 358 SASCLTQFCILFKRTFLSIMRDSVLTHLRITSHIGIGLLIGLLYLGIGNEAKK— VLSNS 415 

Qy- 422 GLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIATV 481 

||:: : :: I I I : I : I : I | | : : I I :: I 

D b 416 GFLFFSMLFIJ^FAALMPTVLTFPLEMGVFLREHLNYWYSLKAYYLAKTMADVPFQIMFPV 475 



Qy 482 IFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSIS 541 

: I : I I : I I : I I : : I I I : I | : | : : 

Db 476 AYC S I VYWMT S Q P S DAVAFVL FAAL GTMT S LVAQ S L G L- L I GAAS T S LQVAT FVG P VT AI 534 

Qy 542 GLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLN 585 

: I : I M : : I I : : I : : : I I : : : : III: 
Db 535 PVLLFS GFFVS FDT I PT YLQWMS YI S YVRYGFEGVI LS - 1 YGLD 577 



RESULT 11 
T31958 

hypothetical protein F02E11.1 - Caenorhabditis elegans 
C; Species: Caenorhabditis elegans 

C;Date: 29-Oct-1999 #sequence_revision 29-Oct-1999 #text_change 31-Jan-2000 
C;Accession: T31958 
R;Favello, A.; Scheet, P. 

submitted to the EMBL Data Library, July 1997 

A; Description: The sequence of C. elegans cosmid F02E11- 

A; Reference number: Z21104 

A;Accession: T31958 

A; Status: preliminary; translated from GB/EMBL/DDBJ 
A;Molecule type: DNA 
A; Residues: 1-658 <FAV> 

A;Cross-references: EMBL: AF016661; PIDN : AAB66050 . 1 ; GSPDB : GN00020; CESP : F02E11 . 1 

A; Experimental source: strain Bristol N2; clone F02E11 

C; Genetics: 

A; Gene: CESP : F02E11 . 1 

A;Map position: 2 

A;Introns: 115/3; 158/3; 214/3; 330/3; 368/2; 448/3; 525/1 

C; Superf amily: fruit fly white protein; ATP-binding cassette homology 

Query Match 17.2%; Score 579.5; DB 2; Length 658; 

Best Local Similarity 27.8%; Pred. No. 2.9e-35; 

Matches 169; Conservative 117; Mismatches 254; Indels 69; Gaps 13; 



Qy 


73 


VSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGT-LEGEVFVNGCELRRDQFQDCFS 
II | | : : : : : | | | : I I I I I : : 1 1 1 1 : 1 1 1 1 : : : : : : 
VSGVAEPGEVTiALMGGSGAGKTTLMN-ILAHLDTNGVEYLGDVTW 


131 


Db 


79 


137 


Qy 


132 


YVLQSDVFLSSLTWETLRYTAMLALCRSSADFYNKKVE 

1 1 1 1 : 1 : I 1 1 1 1 1 1 1 1 : : : : : : I I I : : : : 1 : : : 1 1 1 
YVQQVDLFCGTLTVREQLTYTAHMRMKNATVQQKMERVENVLRDMNLTDCQNTLIGIPNR 


190 


Db 


138 


197 


Qy 


191 


FGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELARRDRIVIVTI 

1 1 1 1 1 : : 1 : : 1 : : 1 1 1 1 : : 1 1 1 1 : 1 1 1 1 : : : 1 1 : 1 1 : : : 1 1 : 
MKGISIGEKKRLAFACEILTDPKILFCDEPTSGLDAFMASEWRALLDLANKGKTIIVATL 


250 


Db 


198 


257 


Qy 


251 


HQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCG — YPCPEHSNPFDFYMDLTSV 

1 1 1 1 : 1 : 1 1 : : 1 : 1 : 1 : : 1 1 : 1 : II Mill 1 : 
HQPSSTVFRMFHKVCFMATGKTVYHGAVDRLCPFFDKLGPDFRVPESYNPADFVMSEISI 


308 


Db 


258 


317 


Qy 


309 


DTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTKDPPGMFGK 

1 1 1 1 1:: 1 :: 1 1 1 ::|: ||: II 
SPETEQEDVTRIEYLIHEYQNSDIGTQMLK KTRTAVDEFGG 


368 


Db 


318 


358 


Qy 


369 


LG VLL RRVT RNLMRNKQ AVT MRLVQN L I MGL FL I FY L LRVQ 


409 



: I I : I I I : .: : I II 



Db 



359 YGDDEDDGESRYNSTFGTQFEILLKRSLRTTFRDPLLLRVRFAQILATAI LVGIVNWRVE 418 



Qy 410 NNT L KG- AVQ D RVGLL YQ LVGAT P YT GMLNAVNL F PML RAVS DQ E S Q D GL YH KWQMLLAY 4 68 

I I I : I : I :: I : : I I : I : I : : I I I 

Db 419 LKGPTIQNLEGVMYNCARDMTFLFYFPSVNVITSELPVFLREHKSNIYSVEAYFLAK 475 

Qy 469 VLHVLPFSVIATVI FSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNP 528 

III | : | : :: | | I I III : : : I I : 
Db 476 S LAEL PQ YT I L PMI YGT 1 1 YWMAGLVAS VT S FLVFVFVC I TLTWVAVS I AYVGAC I FGDE 535 

Qy 529 NIVNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTC 588 

: I : : : : hill I : I : : : : : : I : II I : : : : 
Db 536 GLWT FMPMFVLPMLVFG- GFYVNANS I PVYYQYVS FVS WFKHGFEALEANQWKE I DKI S 594 

Qy 589 GGSNTSMLNHPMCAITQGVQFIEKTCP GAT S RFTAN FL I L YGFI PALVI 637 

I I : I : I I I II I : Mill: I 

Db 595 GCD LINPLNATTTGY CPASDGPGILTRRGI DTPLYANVLI LFMS FFVYRI 644 

Qy 638 LGIVIFKVR 646 

: I : I I : I 

Db 645 IGLVALKIR 653 



RESULT 12 
T47648 

ABC transporter-like protein - Arabidopsis thaliana 

N;Alternate names: protein T15C9.80 

C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 20-Apr-2000 #sequence_revision 20-Apr-2000 #text_change 19-May-2000 
C /Accession: T47648 

R;Mewes, H.W.; Rudd, S.; Lemcke, K. ; Mayer, K.F.X. 
submitted to the Protein Sequence Database, April 2000 
A/Reference number: Z24470 
A; Accession: T4764 8 
A; Status: preliminary 
A;Molecule type: DNA 
A; Residues: 1-720 <MEW> 
A;Cross-references: EMBL: AL132970 

A; Experimental source: cultivar Columbia; BAC clone T15C9 
C; Genetics : 
A;Map position: 3 
A;Note: T15C9.80 

C;Superfamily: Arabidopsis thaliana probable ATP-binding cassette protein 
F12L6.1; ATP-binding cassette homology 

Query Match 17.1%; Score 577; DB 2; Length 720; 

Best Local Similarity 27.8%; Pred. No. 5.1e-35; 

Matches 173; Conservative 120; Mismatches 237; Indels 92; Gaps 18; 

Qy 43 HVSYSVSNR VGPWWNIKSCQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTT 95 

:::|:|| I : II : I :| ::| I : I : : I I : I I I I I : I 

Db 61 NLT YNVSVRRKLDFHDLVPWRRT S FS KTK TLLDNISGETRDGEILAVLGASGSGKST 117 

Qy 96 LLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAML 155 

I : I I :: I : : I : I : I I : I I I : : : I I : I I : I I I I I I : I 

Db 118 LIDALANRIAK-GSLKGTVTLNGEALQSRMLKVI SAYVMQDDLLFPMLTVEETLMFAAEF 176 



Qy 156 ALCRSSADFYNK-KVEAVMTELSLSHVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKV 214 

Ml | : | : | : : : | : : | : I I I I I I I I I I I I I : : I I I 

Db 177 RLPRSLPKS KKKLRVQALI DQLGI RNAAKTI IGDEGHRGI SGGERRRVS I GI DI IHDPIV 236 

Qy 215 MMLDEPTTGLDCMTANQIVXLIAELARRDRIVIVTIHQPRSELFQHFDKIAILTYGELVF 274 

: M | I I : I I I : I : I : I : I I : I :: I I I I : I : : hi I I 

Db 237 LFLDEPTSGLDSTSAFMVVKVLKRIAESGSIIIMSIHQPSHRVLSLLDRLIFLSRGHTVF 296 

Qy 275 CGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSRERE IETYKRVQM 325 

I : I : I I | | | | : | : | : I I III : I I : I 

Db 297 SGSPASLPSFFAGFGNPIPENENQTEFALDLI RELEGSAGGTRGLVEFNKKWQE 350 

Qy 326 LECAFKESD 1 YHKILENIERARYLK TLPMVP 356 

: : | : I : : I : I I : : I I : 

Db 351 MK KQSNPQTLT P PAS PNPNLTLKEAI SAS I S RGKLVS GGGGGS S VINHGGGTLAVPA 407 

Qy 357 FKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFL— IFYLLRVQNNTLK 414 

I : : : | | | I I : : III : : | | : | : I : I : I 
Db 4 08 FANP FWI EI KTLTRRSI LNSRRQPELLGMRLATVIVTGFI LATVFWRL DNSPK 460 

Qy 415 GAVQ DRVGLL YQLVGAT P YT GMLNAVNL F PMLRAVS DQE S QD GL YHKWQMLLAYVLHVL P 474 

I I I : I : I : : I : : I : : I I : : I : I : : I :: : I 

Db 461 G-VQERLG-FFAFAMSTMFYTCADALPVFLQERYIFMRETAYNAYRRSSYVLSHAIVTFP 518 

Qy 475 FSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIV 531 

: : : I : : I : I I : I : : : I I I I hi I : : : 

Db 519 SLIFLSLAFAVTTFWAVGLEGGLMGFLFYCLIILASFWSGSSFVTFLSGW — PHVMLGY 576 

Qy 532 NSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGS 591 

:||:|: hill I :| I : II I :: I I I 
Db 577 TIWAILAY — FLLFSGFFINRDRIPQYWIWFHYLSLVKYPYEAVLQNEF 624 

Qy 592 NTSMLNHPMCAITQGVQFIEKT 613 

: I : I I I : : 

Db 625 SDPTECFVRGVQLFDNS 641 



RESULT 13 
T34391 

hypothetical protein T26A5.1 - Caenorhabditis elegans 
C; Species: Caenorhabditis elegans 

C;Date: 29-Oct-1999 #sequence_revision 29-Oct-1999 #text_change 04-Mar-2000 
C;Accession: T34391 
R;Du, Z. 

submitted to the EMBL Data Library, April 1994 

A; Description: The sequence of C. elegans cosmid T26A5. 

A; Reference number: Z21516 

A; Accession: T34391 

A; Status: preliminary; translated from GB/EMBL/DDBJ 
A;Molecule type: DNA 
A; Residues: 1-608 <DUZ> 

A; Cross-references: EMBL:U00043; PIDN : AAC77504 . 1 ; GSPDB : GN00021 ; CESP:T26A5.1 

A; Experimental source: strain Bristol N2; clone T2 6A5 

C; Genetics : 

A; Gene: CESP:T26A5.1 

A;Map position: 3 

A;Introns: 23/1; 96/3; 243/1; 342/2; 374/3; 403/1; 428/2; 464/3; 494/3; 534/2 



C; Super family: fruit fly white protein; ATP-binding cassette homology 



Query Match 17.0%; Score 573.5; DB 2; Length 608; 

Best Local Similarity 26.2%; Pred. No. 7.4e-35; 

Matches 161; Conservative 118; Mismatches 249; Indels 87; Gaps 14; 

Qy 64 KWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRR 123 

| | : I I : I I I :||::: : : I : I I : I I I I I I : : I : : II : III h 
Db 40 KEKRLLLKNVSGYAKSGELLALMGASGAGKTTLLNMLMCRNLKGLSTEGTITVNGNEMAH 99 

Qy 124 DQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRSSAD^ 183 

: : | :: | : : | | | : | | I I : I : I I : : I I I 

Db 100 -KI S S I S GFAQQEELFVGTLTVKE YLMI QAKLRI -NGS KKLREDRVTDVLHQLKLWKCRD 157 

Q y 184 QMI GS Y-NFGGI S SGERRRVS IAAQLLQDPKVMMLDEPTTGLDCMTANQI VLLLAELARR 242 

I | III II II:: I ::l :l IMIIMI I : : : I : I : 

Db 158 SKIGVIGEKKGISGGEARRLTFACEMLSNPSLLFADEPTTGLDSFMAESVIQILKGIAKT 217 

Q y 24 3 DRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFY 302 

I :| Mill |:|:| I :: I I I MIM : II II: I : M :: 
Db 218 GRTIICTIHQPSSQLYQMFHRVIYLANGSTAFQGTPQESISFFEKCGHRVPDEYNPSEWI 277 

Qy 303 MDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTKDP 362 

: : | : | : : | : I : : : : I I : : : I : II 
Db 278 IYKLAVQP GQEKQSNDRIQKIVEQYEDSDHQKRVMEQLS — DVSEKIP 323 

Qy 363 P GMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGA 416 

| : | : : | I : : I I : : : : I : : II I : II : 

Db 324 PPEMHRANVFTQIFALSTRCGIDVWRAPQLTLAKVIQKILFGLFIGLLYLRTPYDA RG 381 

Qy 417 VQDRVGLLYQLVG AT P YT GMLNAVN L F PML RAVS DQESQDGLYH KWQML LAYVLH V 472 

: : I I : I I Ml I MM: M I I I I : I I : : 

Db 382 IHNINGALFFLAGEYIYSTAYAIMFFLNNEFPLVA REYHDGLYNLWTYYFARCISL 437 

Qy 473 LPFSVIATVIFSSVCYWTLGLYPEV ARFGYFSAALLAPHLIGE 515 

: I M : I I : I I I : I I : : 
Db 438 IPLFSTDGLILLFIVYWLIGLNTSVMQVIVASIITVLASQAASAFGIAMSCI 489 

Qy 516 FLTLVLLGIVQNPNIVNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEI 575 

II : : : : | : | I : I I I I | :: | : : :: I 

Db 490 F PTAQMT AVMAS P P LV LFRLFGGLYG NTNTFPAAIRWLQWI SMYRFAFEG 539 

Qy 576 LWNEFYGLNFTCGGSNTSMLNHPMCAITQGVQFIEKTCPGATSRFTANFLILYGFIPAL 635 

MM:: :: : : I : I I I I 
Db 540 LWNQWSEIDDFHSNAKNNWTNSTTNDVLDYFAFSESAIP L 580 



Qy 636 VILGIVI FKVRDYLI 650 

I : I ::: : III 
Db 581 DIIGLILISLAFYLI 595 



RESULT 14 
D96553 

hypothetical protein F5D21.6 [imported] - Arabidopsis thaliana 
C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 02-Mar-2001 #sequence_revision 02-Mar-2001 #text_change 23-Mar-2001 
C; Access ion: D96553 



R;Theologis, A.; Ecker, J.R.; Palm, C.J.; Federspiel, N.A. ; Kaul, S.; White, 0.; 
Alonso, J.; Altaf, H.; Araujo, R. ; Bowman, C.L.; Brooks, S.Y.; Buehler, E.; 
Chan, A.; Chao, Q.; Chen, H. ; Cheuk, R.F.; Chin, C.W.; Chung, M.K.; Conn, L.; 
Conway, A.B.; Conway, A.R.; Creasy, T.H.; Dewar, K. ; Dunn, P.; Etgu, P.; 
Feldblyum, T.V.; Feng, J.; Fong, B.; Fujii, C.Y.; Gill, J.E.; Goldsmith, A.D.; 
Haas, B.; Hansen, N.F.; Hughes, B.; Huizar, L. 
Nature 408, 816-820, 2000 

A;Authors: Hunter, J.L.; Jenkins, J.; Johnson-Hopson, C; Khan, S.; Khaykin, E.; 
Kim, C.J.; Koo, H.L.; Kremenetskaia, I.; Kurtz, D.B.; Kwan, A.; Lam, B.; Langin- 
Hooper, S.; Lee, A.; Lee, J.M.; Lenz, C.A.; Li, J.H.; Li, Y. ; Lin, X.; Liu, 
S.X.; Liu, Z.A. ; Luros, J.S.; Maiti, R. ; Marziali, A.; Militscher, J.; Miranda, 
M. ; Nguyen, M. ; Nierman, W.C; Osborne, B.I.; Pai, G.; Peterson, J.; Pham, P.K.; 
Rizzo, M. ; Rooney, T.; Rowley, D.; Sakano, H. 

A;Authors: Salzberg, S.L.; Schwartz, J.R.; Shinn, P.; Southwick, A.M.; Sun, H. ; 
Tallon, L.J.; Tambunga, G. ; Toriumi, M.J.; Town, CD.; Utterback, T.; van Aken, 
S.; Vaysberg, M. ; Vysotskaia, V.S.; Walker, M. ; Wu, D. ; Yu, G. ; Fraser, CM.; 
Venter, J.C; Davis, R.W. 

A; Title: Sequence and analysis of chromosome 1 of the plant Arabidopsis. 

A; Reference number: A86141; MUID:21016719; PMID : 11130712 

A;Accession: D96553 

A; Status : preliminary 

A;Molecule type: DNA 

A; Residues: 1-687 <STO> 

A; Cross-references: GB:AE005173; NID : gl 0092 349 ; PIDN : AAG12758 . 1 ; GSPDB : GN00141 

C; Genetics : 

A; Gene: F5D21.6 

A; Map position: 1 

C; Super family: Arabidopsis thaliana probable ATP-binding cassette protein 
F12L6.1; ATP-binding cassette homology 

Query Match 17.0%; Score 571.5; DB 2; Length 687; 

Best Local Similarity 28.5%; Pred. No. 1.2e-34; 

Matches 160; Conservative 120; Mismatches 216; Indels 65; Gaps 16; 



Qy 


67 


RQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQF 

I : : 1 : : : I 1 : 1 1 1 : 1 1 1 1 1 1 : 1 1 1 1 : : : 1 1 1 1 : I : : I I : 1 1 
RRLLDGLNGHAEPGRIMAIMGPSGSGKSTLLDSLAGRLARNVIMTGNLLLNGKKARLD — 


126 


Db 


42 


99 


Qy 


127 


QDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRSSADF YNKKVEAVMTELSLSHVA 

: 1 1 1 1 : : : 1 1 1 1 1 1 : 1 : 1 1 1 hi 1 II : M 1 1 
YGLVAYVTQEDILMGTLTVRETITYSAHL RLSSDLTKEEVNDIVEGTIIELGLQDCA 


182 


Db 


100 


156 


Qy 


183 


DQMIGSYNFGGISSGERRRVSIAAQLLQDPKVT^LDEPTTGLDCMTANQIVLLLAELAK- 

|::||::: | : | M 1 : 1 1 1 : 1 : : 1 1 = * = 11111 = 111 :| :: 1 :ll 
DRVIGNWHSRGVSGGERKRVSVALEILTRPQILFLDEPTSGLDSASAFFVIQALRNIARD 


241 


Db 


157 


216 


Qy 


242 


RDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDF 

I I :: 1 1 1 1 1 1 : 1 1 1 : : 1 : 1 1 1 : 1 : : 1 1 1 : 1 1 1 : 1 1 1 

GGRTWSSIHQPSSEVFALFDDLFLLSSGETVYFGESKFAVEFFAEAGFPCPKKRNPSDH 


301 


Db 


217 


276 


Qy 


302 


YMDLTSVDTQSREREIETYKRVQMLECA FKESDI YHKILENIERARYLKTLPMV 


355 


Db 


277 


: : : 1 : : : : 1 : : | : | : : : | | I : 1 1 : 
FLRC I N S DFDT VT AT LKGS QRI RET PAT S D P LMNLAT S E I KARLVEN YRRS VYAKS A 


333 


Qy 


356 


PFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAV1MRLVQNLIMGLF LI FYLLRVQ 


409 


Db 


334 


: :: : | |: :| : :| : :: I 1 : :l 1: 
KSRIRELASIEGHHGMEVR KGSEATWFKQLRTLTKRSFVNMCRDIGYYWSRI- 


385 



410 NNT L KGAVQD RVGL L YQ LVGAT P YT GMLNAVN L FPML RAVSD 4 51 

: I | | :: M : M : I I : I I I 

386 — VI YIWS FCVGTI FYDVGHS - YT S I LARVS CGGFI TGFMT FMS I GGFP S FI EEMKVFY 442 

Qy 452 QESQDGLYHKWQMLLAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPH 511 

: I || : : : : III : I I : I : I I : : : I : 

Db 443 KERLSGYYGVSVYIISNYVSSFPFLVAIALITGSITYNMVKFRPGVSHWAFFCLNIFFSV 502 

Qy 512 LIGEFLTLVLLGIVQNPNIVNSIVALLSISG-LLIGSGFIRNIQEMPIPLKI LGY 565 

: | | : | : : I I I : : : | | : : : I I I I : : : I I : : : 

Db 503 SVIESLMMWASLV — PNFLMGLITGAGIIGIIMMTSGFFRLLPDLP KVFWRYPISF 557 

Qy 566 FTFQKYCCEILWNEFYGLNF 586 

: : : : I : I I I I 
Db 558 MSYGSWAIQGAYKNDFLGLEF 57 8 



Qy 

Db 



RESULT 15 
B88474 

protein C05D10.3 [imported] - Caenorhabditis elegans 
C; Species: Caenorhabditis elegans 

C;Date: 10-May-2001 #sequence_revision 10-May-2001 #text_change 15-Jun-2001 
C; Accession: B88474 

R; anonymous , The C. elegans Sequencing Consortium. 
Science 282, 2012-2018, 1998 

A; Title: Genome sequence of the nematode C. elegans: a platform for 
investigating biology. 

A; Reference number: A75000; MUID: 99069613 ; PMID: 9851916 
A;Note: see websites genome.wustl.edu/gsc/C_elegans/ and 
www_sanger.ac.uk/Projects/C_elegans/ for a list of authors 

A;Note: published errata appeared in Science 283, 35, 1999; Science 283, 2103, 

1999; and Science 285, 1493, 1999 

A;Accession: B88474 

A; Status: preliminary 

A; Molecule type: DNA 

A; Residues: 1-559 <STO> 

A; Cross-references: GB:chr_III; PIDN : AAA20989 . 1 ; PID:g532111; GSPDB : GN00021; 
CESP:C05D10.3 

A;Note: similar to D. melanogaster white protein 

C; Genetics : 

A; Gene: C05D10.3 

A;Map position: 3 

C;Superfamily: fruit fly white protein; ATP-binding cassette homology 

Query Match 16.9%; Score 570; DB 2; Length 559; 

Best Local Similarity 25.6%; Pred. No. 1.2e-34; 

Matches 150; Conservative 125; Mismatches 241; Indels 70; Gaps 13; 

67 RQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQF 12 6 

: : I I : I I I M : : : M I M I : I I I I I : : : : I : : I : : : I : : 

7 KEILHNVSGMAESGKLLAILGSSGAGKTTLMNVLTSRNLTNLDVQGSILIDGRRANKWKI 66 

127 QDCFS YVLQS DVFLS S LTVRET LRYTAMLALCRS S ADFYNK KVEAVMTELSLSHV 181 

: : : : I I I : I : : : I I I I : : I I I : I : : I I I : I : : I 

67 REMSAFVQQHDMFVGTMTAREHLQFMARL RMGDQYYS DHERQLRVEQVLTQMGLKKC 123 



Qy 

Db 



Qy 182 ADQMIGSYN-FGGISSGERRRVSIAAQLLQDPKVM^ 240 

I I : M I | : | | I : : I : I I : : : I I I : : I I I I : I I I I : I I II 
Db 124 ADTVIGIPNQLKGLSCGEKKRLSFASEILTCPKILFCDEPTSGLDAFMAGHWQALRSLA 183 

Qy 241 RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

II : I I I I I I : : I : : : : I : : : I : : : I I I I I I I : II I 
Db 184 DNGMTVIITIHQPSSHVYSLFNNVCLMACGRVIYLGPGDQAVPLFEKCGYPCPAYYNPAD 243 

Qy 301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMV 355 

: :| I ::| ::: I :|: :| I I |: I 

Db 244 HLIRTLAVIDSDRATSMKTISKIRQ GFLSTDLGQSVLA-IGNANKLRAASFVTGSDT 299 

Qy 356 PFKTKDPPGMF-GKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLF — LIFYLLR 407 

I : I I : I I : : I : : : II : I I I : : I : 
Db 300 SEKTKTFFNQDYNAS FWTQFLALFWRSWLTVI RDPNLLSVRLLQILITAFITGIVFF 356 

Qy 408 VQNNT L KGAVQ DRVGLL YQ LVGAT P YT GMLN AVNL F PML RAVS DQ ESQDGLYHK 461 

| : |::: : :| : II : :: : |: :|:| 

Db 357 -QTPVTPATIISINGIMFN H I RNMNFMLQFPNVPVI TAEL P I VLRENANGVYRT 409 

Qy 4 62 WQMLLAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVL 521 

|| : || :| :::::: I I I I I I : : I : : : 

Db 410 SAYFLAKNIAELPQYIILPILYNTIVYWMSGLYPNFWNYCFASLVTILITNVAISISYAV 469 

Qy 522 LGIVQNPNIVNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEF 581 

| | :: :|: : : : I I I :| II :: II I I :||: 

Db 470 ATIFANTDVAMTILPIFWPIMAFG-GFFITFDAIPSYFKWLSSLSYFKYGYEALAINEW 528 

Qy 582 YGLNFTCGGSNT SMLNH PMCAI TQGVQFI EKTCPGAT S RFTANFLI 627 

: I I : I I I II 

Db 529 DSI KVIPESISRLRIKFLI 547 



Search completed: February 27, 2004, 07:18:52 
Job time : 16.5272 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 
Run on: February 27, 2004, 07:17:39 



; Search time 29.3006 Seconds 
(without alignments) 
4698.604 Million cell updates/sec 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



US-09-989-981A-2 
3369 

1 MGELPFLSPEGARGPHINRG . 



. PALVI LGI VI FKVRD YLI S R 652 



BLOSUM62 

Gapop 10.0 , Gapext 0.5 



Searched: 809742 seqs, 211153259 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 



809742 



Post-processing: 



Minimum Match 0% 
Maximum Match 100% 
Listing first 45 summaries 



Database 



Published_Applications__AA: * 

1: /cgn2_6/ptodata/2/pubpaa/US07_PUBCOMB.pep: * 

2 : /cgn2_6/ptodata/2/pubpaa/ PCT_NEW__PUB . pep : * 

3: /cgn2_6/ptodata/2/pubpaa/US06_NEW_PUB.pep:* 

4 : / cgn2_6/ptodata/2 /pubpaa/US 0 6_PUBCOMB . pep : * 

5: /cgn2_6/ptodata/2/pubpaa/US07_NEW_PUB.pep:* 

6: /cgn2_6/ptodata/2/pubpaa/PCTUS_PUBCOMB.pep:* 

7: /cgn2_6/ptodata/2/pubpaa/US08_NEW__PUB.pep:* 

8: /cgn2_6/ptodata/2/pubpaa/US08_PUBCOMB.pep:* 

9: /cgn2_6/ptodata/2/pubpaa/US09A_PUBCOMB.pep: * 
10: /cgn2_6/ptodata/2/pubpaa/US09B_PUBCOMB.pep:* 
11: /cgn2_6/ptodata/2/pubpaa/US09C_PUBCOMB.pep:* 
12 : /cgn2_6/ptodata/2/pubpaa/US09_NEW_PUB . pep : * 
13: /cgn2_6/ptodata/2/pubpaa/USlOA_PUBCOMB.pep:* 
14: /cgn2_6/ptodata/2/pubpaa/USlOB_PUBCOMB.pep:* 
15: /cgn2_6/ptodata/2/pubpaa/US10C_PUBCOMB.pep:* 
16: /cgn2_6/ptodata/2/pubpaa/US10_NEW_PUB.pep:* 
17: /cgn2_6/ptodata/2/pubpaa/US60_NEW_PUB.pep:* 
18: /cgn2_6/ptodata/2/pubpaa/US60_PUBCOMB.pep:* 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 



Result Query 

No. Score Match Length DB 



ID 



Description 



1 
X 


JJ03 


i nn 


n 


DOZ 


Q 


TTc_nQ— P "^7 — QQ9 — 1 

UO \J Zj OOi _/_?Z. X 


Sequence 


L f Appli 


z 




i nn 


n 
u 


D3Z 


i n 




Sequence 


2, Appli 


O 


Z / 4 4 . 0 


o i 
01. 


c: 
3 




q 


UO OO/ _?i?Z O 


Sequence 3, Appli 


/I 


Z / 4 4 . 0 


Q 1 
O 1 . 


3 


DjI 


i n 


TTQ— HQ- Q ft Q — Q ft 1 Z\ — £ 


Sequence 


6, Appli 


c 


z / 44 . 0 


Q 1 
O 1 . 


3 


D3 1 


Xf± 


Uo XU Ui7U 400 D 


Sequence 


6, Appli 


D 


I 1 1 1 

II / / 


04 . 


Q 


Z 3 D 


1 R 

13 


Uo XU XU4 U*i / Z / JO 


Sequence 


2795, Ap 


O 
/ 


/ U 1 . 0 


o n 
Z U . 


p 
o 


O / Z 


t n 
xu 


TTQ— n Q— Q ft Q — Q ft 1 A — A 


Sequence 


A, Appli 


o 
0 


,c n o c 

by o . o 


o n 
ZU . 


b 


boo 


i n 
1 U 


Lib - uyyoxuoux 


Sequence 


1, Appli 


9 


by J . 3 


o n 
zU . 


b 


D33 


10 


UolU4UOOUDXO 


Sequence 


13, Appl 


10 


by i . o 


o n 
zU . 


3 


b33 


Q 

y 


Uo uy yox ooo oo 


Sequence 35, Appl 


11 


b91 . o 


zU . 


3 


b33 


14 


TTQ 1 n_ 1 O C\— & ft 7 — £1 
Uo 1U 1ZU DO/ DX 


Sequence 


61, Appl 


12 


by l . o 


O A 
ZU . 


3 


D33 


10 


ttq— i H— ^n^— ftn^— 9 

Uo XU 4UO OUD Z 


Sequence 


2, Appli 


13 


bo y . o 


z U . 


3 


boo 


Q 

y 


ttq no ft ft ^ f^a— i n 

Uo~Uy~ODO OO DA X U 


Sequence 


LO, Appl 


14 


689 . 5 


zU . 


3 


boo 


1 A 

14 


Ub~ !UUyU4030 


Sequence 


5, Appli 


15 


/zoo c 

688.5 


o n 
zU . 


4 


b / o 


1 n 
1U 


TTQ_ n Q— Q ft Q — Q ft 1 Zi— ft 

Uo uy yoy yo i/\ o 


Sequence 


8, Appli 


16 


688.5 


o r\ 
zU . 


4 


b / O 


1 A 

14 


ttc in nan A^^ — i 
Uo— !UUyU430 / 


Sequence 


7, Appli 


17 


o o cr 

683.3 


zU . 


O 


/Ten 

boo 


y 


ttc no q ft f^£a_ on 

Uo — Uy ODD OD o/\ z / 


Sequence 27 , Appl 


18 


677 


o n 
20 . 


1 


bo / 


y 


TTC HQ Q ft £^^7i — 1 ^ 

Ub — uy- Ob D 0 D DA 1 4 


Sequence 


14, Appl 


19 


628 


18 . 


b 


iuy3 


lo 


ttc in q^q yi qo ono 1 ^ 
Uo— 1U — oby— 4yoZUz3 


Sequence 


2025, Ap 


20 


602 . 5 


1 / . 


y 


l U4 y 


lo 


ttc in Q^Q_/1 QQ_1 con 

Uo~iu~ oby4yoiozu 


Sequence 


1520, Ap 


21 


592 . 5 


1 / . 


b 


b / 4 


14 


ttq in nan /i c.r >i 
Uo - lUUyU40O— 4 


Sequence 


4, Appli 


22 


592.5 


1 / . 


b 


b / 4 


lb 


ttq in aoo i in 
Uo— iu— 4zy~ibuiu 


Sequence 


10, Appl 


23 


C Q £Z 

586 


1 / . 


4 


ceo 

bbo 


1 o 
lo 


ttc in i n ft — £n r_ o a r 
Uo— IU - lUobUDZ40 


Sequence 


245, App 


24 


580 


1 / . 


Z 


bob 


1 o 

lo 


ttc 1 n mo ^oi i n 
Uo - 1U U /Z — Dzl lU 


Sequence 


10, Appl 


25 


579 . 5 


17 


2 


6oo 


10 


TTQ in QdQ /IQQ CQyl 7 

Uo— iu— oby— 4 y o— 004 / 


Sequence 


5347, Ap 


26 


576.5 


1 / 


1 


b4 b 


n O 

lo 


ttc in mo coi—Q 
Uo _ lU - U /Z — bZ i y 


Sequence 


9, Appli 


27 


r- ""7 C 

3/6.3 


T ""7 
1 / 


1 


b4 b 


1 A 
14 


ttc in flQfl /l 1 ^^ — o 
Uo lU UyU fiOO Z 


Sequence 


2, Appli 


28 


574.5 


17 


1 


c no 

5y9 


lo 


ttq in oin ion i/i 
lib — 1U — Z1U — loU— 14 


Sequence 


14, Appl 


29 


C """J O C 

573.3 


1 / 


U 


bUo 


10 


TTQ in 0.£G /1QQ_R7/Ift 

Ub— iuoby4yoo /4o 


Sequence 


5748, Ap 


30 


570 


16 


9 


c c n 
339 


13 


Uo - iu— oby— 4yo— o /4U 


Sequence 


5740, Ap 


31 


a ei n c 

569 . 5 


16 


y 


COT 

bz / 


-| A 

14 


ttc in nan /i^R—ft 
ub— iu— uyu— 403— o 


Sequence 


8, Appli 


o o 

32 


5by 


n ez 
lb 


y 


bU4 


Q 

y 


ttc no n a R — i i q i 

Uo Uy / 43 /OO I;? / 


Sequence 


197, App 


33 


562 . 5 


1 tz 

16 


1 


646 


T O 

13 


ttq in ic/i /ico a 
Ub— IU— 104— 40z— 4 


Sequence 


4, Appli 


34 


558.5 


16 


<- 

6 


(Z A tZ 

b4 b 


14 


ttc in mci noi o 

ub— iu— u /y— Uo / Z 


Sequence 


2, Appli 


35 


555 . 5 


16 


c 
3 


C A (Z 

b4 b 


14 


ttc in nan / cc no 

Ub— iu— uyu— 4 00— lo 


Sequence 


13, Appl 


36 


334 . 3 


lb 


cr 
. 3 


b4b 


1 o 
lo 


ttq in i R/t /i^O — ft 
Uo - IU— 134— 43Z0 


Sequence 


8, Appli 


3 / 


C C A 

334 


1 tz 

lb 


A 

* 4 


biu 


10 


ttq in ^^Q-zi QQ_Rf;P7 
U b 1 u o o y 4 zj O O OO / 


Sequence 


5687, Ap 


o o 

38 


C. O (Z c 
3Z 6 . 3 


lo 


. b 


b / b 


10 


TTQ 1 n_OSQ_/'IQ^-07QQ 

u b - 1 u o by 4 y o— o /yy 


Sequence 


3799, Ap 


o r> 

oy 


coo c; 
3Z O . 3 


10 


. 3 


bo y 


10 


TTQ — 1 n-O^Q-/lQO._(;i ft/ 

ubiuooy4yoDX04 


Sequence 


6184, Ap 


40 


520 


15 


.4 


695 


15 


US-10-369-493-6199 


Sequence 


6199, Ap 


41 


487.5 


14 


.5 


545 


14 


US-10-083-357-1335 


Sequence 


1335, Ap 


42 


480 


14 


.2 


551 


15 


US-10-369-493-3562 


Sequence 


3562, Ap 


43 


476.5 


14 


.1 


560 


15 


US-10-369-4 93-128 99 


Sequence 


12899, A 


44 


454.5 


13 


.5 


615 


10 


US-09-949-029-24 


Sequence 


24, Appl 


45 


423 


12 


.6 


1549 


15 


US-10-369-493-3919 


Sequence 


3919, Ap 



ALIGNMENTS 



RESULT 1 
US-09-837-992-1 

; Sequence 1, Application US/09837992 
; Patent No. US20020081687A1 
; GENERAL INFORMATION: 
; APPLICANT: Tian, Hui 



; APPLICANT: Schultz, Joshua 
; APPLICANT: Shan, Bei 
; APPLICANT: Tularik Inc. 

; TITLE OF INVENTION: Sitosterolemia Susceptibility Gene (SSG) : Compositions 

; TITLE OF INVENTION: and Methods of Use 

; FILE REFERENCE: 01878 1-0 06020US 

; CURRENT APPLICATION NUMBER: US/09/837,992 

; CURRENT FILING DATE: 2001-04-18 

; PRIOR APPLICATION NUMBER: US 60/198,465 

; PRIOR FILING DATE: 2000-04-18 

; PRIOR APPLICATION NUMBER: US 60/204,234 

; PRIOR FILING DATE: 2000-05-15 

; NUMBER OF SEQ ID NOS : 45 

; SOFTWARE: Patentln Ver. 2.1 

; SEQ ID NO 1 

LENGTH: 652 

TYPE: PRT 
; ORGANISM: Mus mus cuius 

FEATURE : 

; OTHER INFORMATION: mouse sitosterolemia susceptibility gene (SSG) 

; OTHER INFORMATION: amino acid sequence 

US-09-837-992-1 

Query Match 100.0%; Score 3369; DB 9; Length 652; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 652; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

Qy 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

Qy 121 LRRDQ FQ DC FS YVLQ S DVFL S S LT VRET LRYT AMLAL C RS S AD F YNKKVEAVMT E LS L S H 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRSSADFYNKKVEAVMTELSLSH 180 

Qy 181 VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKWMLDEPTTGLDCMTANQIVLLLAELA 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELA 240 

Qy 241 RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 241 RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

Qy 301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 

Qy 361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
Db 361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

Qy 421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 

I I I I I I I M I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I 



Db 


421 


VGLLYQLVGATPYTGMLNAWLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 


480 




A R 1 

H, 0 X 


VTF^WTYWTT.f^T YPFVARFnYFSAALIAPHT.TGF.FLTLVLLGTVONPNIVNSIVALLSI 


540 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 




Db 


481 


VIFSSVCYWTLGLYPEVARFGYFSA7VLLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 


540 


Pit 7 


D ^ -L 


^TT T TP crrFTRMTDFMPTPT KTT.^YFTFOKYCCFTT.VVNFFY^T.NFTCGGSNTSMLNHPM 


600 






1 1 1 1 [ I 1 1 1 1 1 1 I 1 1 1 1 1 E 1 1 1 1 E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 E 1 E E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

1 1 1 M 1 1 1 1 1 M 1 1 1 1 1 II 1 ! 1 II 1 1 1 1 1 I ! 1 1 1 1 1 II 1 1 1 II II 1 M 1 1 1 1 1 1 1 1 l l l l 




Db 


541 


SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILVVNEFYGLNFTCGGSNTSMLNHPM 


600 


Qy 


601 


CAITQGVQFI EKTCPGATSRFTANFLI LYGFI PALVI LGI VI FKVRDYLI SR 652 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 




Db 


601 


CAITQGVQFIEKTCPGATSRFTANFLI LYGFI PALVI LGIVI FKVRDYLI SR 652 





RESULT 2 

US-09-989-981A-2 

Sequence 2, Application US/09989981A 
Publication No. US20030049730A1 
GENERAL INFORMATION: 
APPLICANT: Hobbs, Helen H. 
APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 0187 8 1-007320US 
CURRENT APPLICATION NUMBER: US/09/989, 981A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS : 13 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 2 
LENGTH: 652 
TYPE: PRT 

ORGANISM: Mus mus cuius 
FEATURE : 

OTHER INFORMATION: mouse ABCG5 (mABCG5) 
US-09-989-981A-2 

Query Match 100.0%; Score 3369; DB 10; Length 652; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 652; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 MGELPFLSPEGARGPHINRGSLSSLEQGSWGTFARHSLGVLH^SYSVSNRVGPWWNIKS 60 

Qy 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
Db 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

Qy 121 LRRDQFQDCFSYVIjQSDVFLSSLTVRETLRYTAMIxALCRSSADF 180 



DD 


191 
LZ L 


T PPnnTrnnr'Trc* v\/t nQnVFT ^t.tvrftt ryt amt.at.CRS 9, ADFYNKKVFAVMTELSLSH 


180 


Qy 


181 


VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELA 


240 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 




DD 






240 


Qy 


241 


RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 


300 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


Z 4 1 


KRUK1 VX VI lhyrKbhjLr Writ UJ\±/V± J_il ibEjliVr L.bl rEjE^lljbE rlMlNL-Ui rLrEitloWrr u 


300 


Qy 


301 


FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 


360 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I II 1 M 1 1 




Db 


301 


t"t\ 7-k jtt-\ t mpTmrnACm?nT?Tl?fT>WDirnMT "cr* 7\ irVT? QTiT "VUVT T TT'NTTTrDTV'DVT T*T r PT dm\/o WT 1 T*T 

r YMJJL1 b VD1 yoKrjKEjlilj 1 I J\KvyjYLJjijL,./\r J\Ejo JJl I rlJ\± JjEjIN J. dkaki Jj]\i XjJtriYivr r J\i rv 


360 


Qy 


361 


DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 


420 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 II 1 1 1 1 




Db 


361 


DPPGMFGKLGVLJ-iKRVl HNijMKNKyAVlNKLVyiNJljlJyivjij^ J_i_L e i JjliKVyjNN I ±j]\Lr/wyLJK 




Qy 


421 


VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 


480 




1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 




Db 


421 


VGLLYQLVGATP YTGMLNAVIMLr rNJjKAVoJjyEjbyUva-L inJ\wyJYlljljAi Vljnvljr E ivlKl 


4 ft n 


Qy 


481 


VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 


540 




1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 




Db 


481 


tryuo CI T'r' VTuTT T C T V E> "CI TTS. T> XtT" V TTQ 2i ATT ADHT TPTirT TT \ TJ T (IT^AriMPWTA/TvT 1 ^ T\/AT.T. ^ T 

VI E obVLiWl JjGL i rbVAKr bit oAAij±iAi:rllj_L LjEjE JjI ±j V u JboX vyiN r vt iviNoi v/vU-Uid _l 


<J T U 


Qy 


541 


SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 


600 






i i i i i i i i i i i t I I I I I l l l l l l l l l l l l l l I 1 t 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 
j I I I 1 M 1 1 1 1 M 1 1 1 1 1 1 1 M 1 1 M 1 1 M 1 M II 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 




Db 


541 


SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILVVNEFYGLNFTCGGSNTSMLNHPM 


600 


Qy 


601 


CAI TQGVy r J. EjJ\1 LrbAi bKr 1 AW r LlLibE L irAJj VXJjbX vlr JWKJJ liilon djz 








1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 E 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
i I i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i * ■ 




Db 


601 


CAITQGVQFIEKTCPGATSRFTANFLILYGFI PALVILGIVI FKVRDYLISR 652 




RESULT 


3 






US-09-* 


337-992- 


-3 




; Sequence 3, 


Application US/09837992 





Patent No. US20020081687A1 
GENERAL INFORMATION: 
APPLICANT: Tian, Hui 
APPLICANT: Schultz, Joshua 
APPLICANT: Shan, Bei 
APPLICANT: Tularik Inc. 

TITLE OF INVENTION: Sitosterolemia Susceptibility Gene (SSG) : Compositions 
TITLE OF INVENTION: and Methods of Use 
FILE REFERENCE: 018781-006020US 
CURRENT APPLICATION NUMBER: US/09/837,992 
CURRENT FILING DATE: 2001-04-18 
PRIOR APPLICATION NUMBER: US 60/198,465 
PRIOR FILING DATE: 2000-04-18 
PRIOR APPLICATION NUMBER: US 60/204,234 
PRIOR FILING DATE: 2000-05-15 
NUMBER OF SEQ ID NOS : 45 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 3 



LENGTH : 651 
TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE: 

OTHER INFORMATION: human sitosterolemia susceptibility gene (SSG) 
OTHER INFORMATION: amino acid sequence 
US-09-837-992-3 

Query Match 81.5%; Score 2744.5; DB 9; Length 651; 

Best Local Similarity 80.2%; Pred. No. 1.4e-258; 

Matches 523; Conservative 64; Mismatches 64; Indels 1; Gaps 1; 

Qy 1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

I I : I I : I I : I : I I I I I I I I I I I I I I : I I I I I I I : I I I I I : I I 

Db 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEP-HSLGILHASYSVSHRVRPWWDITS 59 

Qy 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

I : I : I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I : I I I I I II I I I : I I I 
Db 60 CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRA 119 

Qy 121 LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRSSADFYNKKV^VMTELSLSH 18 0 

I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I : I I : I : : I I I I I I I I I I I I I 
Db 120 LRREQFQDCFSWLQSDTLLSSLTVl^ETLHYTALLAIRRGNPGSFQKKvTlAVMAELSLSH 17 9 

Qy 181 VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKWIMLDEPTTGLDCMTANQIVLLLAELA 240 

|||::||:|: I I I I : I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I : I I Ml 
Db 180 VADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELA 239 

Qy 241 RRDRIVI VTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

I I : I I I :: I I I I I I I I I I I I I I I I I I : : I I I : I M I I III I I I : II I I I I I I I I I I I 

Db 240 RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 

Qy 301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 

II I II I I II I I I : I I I I I I I I I I I : I I : I : I I II I : I I I I :: I I I I I I I I I I I I 

Db 300 FYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTK 359 

Qy 361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

I I I: I I I I I I I I I I I I I I : I I I III I 1:11111 II ||:|::|||::| 1111 = 111 
Db 360 DSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDR 419 

Qy 421 VGLLYQLVGAT P YTGMLNAVNLFPMLRAVS DQESQDGLYHKWQMLLAYVLHVLP FS VI AT 480 

MIIM II I M I I I I I I I I I I I I : M M I II I I I I I II I I II : I II I I I II I I I : I I 
Db 420 VGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWAT 479 

Qy 481 VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 540 

: I I I M M II II I : I I I I II I I I I I II I II I I I I I II I II II II I II I M I I I : I II II I 
Db 480 MIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSI 539 

Qy 541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 

: I : I : 1 I I I : I I I I I I I I I I I : I II I II I I I II I M I I II I I I I I I II M Ml 
Db 540 AGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPM 599 



Qy 

Db 



601 
600 



CAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVIFKVRDYLISR 652 
I I II I M I I I II I I I I I II I I I I I I I I I I I I I I I I I I I M I M I M I II 
CAFTQGIQFI EKTCPGATSRFTMNFLI LYS FI PALVI LGI WFKI RDHLI SR 651 



RESULT 4 

US-09-989-981A-6 

Sequence 6, Application US/09989981A 
Publication No. US20030049730A1 
GENERAL INFORMATION: 
APPLICANT: Hobbs, Helen H. 
APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 0187 81-007320US 
CURRENT APPLICATION NUMBER: US/09/989, 981A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS: 13 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 6 
LENGTH: 651 
TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE : 

OTHER INFORMATION: human ABCG5 (hABCG5) 
US-09-989-981A-6 

Query Match 81.5%; Score 2744.5; DB 10; Length 651; 

Best Local Similarity 80,2%; Pred. No. 1.4e-258; 

Matches 523; Conservative 64; Mismatches 64; Indels 1; Gaps 1; 

MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

I I : I I : I I : I : I I I I I I I I I I I I I I : I I I I I I I : I I I I I : I I 

MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEP-HSLGILHASYSVSHRVRPWWDITS 59 

CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

I : I : I I I I I I I I I I I : I I I I I I I I I I I I I I II I I I I I I : I I I I I II I I I : I I I 
CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRA 119 

LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRSSADFYNKKVEAVMTELSLSH 180 

I || : I I I I I I I I I I I I I I I I I I I I I I I II I : I I : I : : I I I I I I I MINI 
LRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSH 179 

VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELA 24 0 

|||::||:|: I I I I : I I I I I I I I I I I I I I I II I I : I I I I I I II I I I I I I I I : I I Ml 
VADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELA 239 

RRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

|:IM::| I I M I I I Ml I M I I I I : M I h I I I I I Ml I I M II I I I I II I II I I 
RRNRI VVLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 

FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 

M I I I I M I I I I = I I I I I I I 1 I i I * I M:l I M 1 = 1111 ::MIIIIIIIIII 
FYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTK 359 



Qy 


1 


Db 


1 


Qy 


61 


Db 


60 


Qy 


121 


Db 


120 


Qy 


181 


Db 


180 


Qy 


241 


Db 


240 


Qy 


301 


Db 


300 



Qy 361 DPP GMFG K LGVL L RRVT RN LMRNKQ AVI MRLVQN L I MGL FL I F YL L RVQNNT L KGAVQ D R 420 

I I I : I I I I I I I I I I I I I I : I I I III I I : I I I I I I I I I : I : : I I I : : I I I I I : I M 
Db 360 DSPGVFSKLGVLLRRVTRNLVRNKIAVITRLLQNLIMGLFLLFFVLRWSNVLKGAIQDR 419 

Qy 421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 

I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I 1111:111 I I I I I I I I : I I 

Db 420 VGLLYQFVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWAT 479 

Qy 481 VI FS SVCYWTLGLYPEVARFGYFSAALLAPHLI GEFLTLVLLGIVQNPNI VNS IVALLSI 54 0 

: I I II II I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I : I I I I I I 
Db 480 MI FSSVCYWTLGLHPEVARFGYFSAALLAPHLI GEFLTLVLLGIVQNPNI WSWALLS I 539 

Qy 541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 

: I : I : I I I I : I I I I I I I I I II: I I I I I I I I I I I I I I I I I I I I I I I I II I : : I I 
Db 540 AGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPM 599 

Qy 601 CAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVI FKVRDYLISR 652 

II I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I : I I : I I I I 

Db 600 CAFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 



RESULT 5 
US-10-090-455-6 

; Sequence 6, Application US/10090455 

; Publication No. US20030027259A1 

; GENERAL INFORMATION: 

; APPLICANT: Chen, Hongyun 

; APPLICANT: Le Bihan, . Stephane 

; TITLE OF INVENTION: NOVEL ABCG4 TRANSPORTER AND USES THEREOF 
; FILE REFERENCE: 100103.406 

; CURRENT APPLICATION NUMBER: US/10/090,455 
; CURRENT FILING DATE: 2002-03-01 
; NUMBER OF SEQ ID NOS: 17 

; SOFTWARE: FastSEQ for Windows Version 4.0 
; SEQ ID NO 6 

LENGTH: 651 

TYPE: PRT 
; ORGANISM: Homo sapiens 
US-10-090-455-6 

Query Match 81.5%; Score 2744.5; DB 14; Length 651; 

Best Local Similarity 80.2%; Pred. No. 1.4e-258; 

Matches 523; Conservative 64; Mismatches 64; Indels 1; Gaps 1; 

Qy 1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

I I : I I : I I : I : I I I I I I I I I I I I I I : I I I I I I I : I I I I I : I I 

Db 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEP-HSLGILHASYSVSHRVRPWWDITS 59 

Qy 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

I : I : I I I I II I I I I I : I I I I I I I I I I I I I I I I I I I I I I : I I I I I II I I I : I II 
Db 60 CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRA 119 

Qy 121 LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRSSAI)FYNKKV^ 180 

111:1111111111111 I 1 M I I I I M I I I : I | : I : : I I I I I I I I I I I I I 
Db 120 LRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSH 17 9 



Qy 



181 VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELA 240 



Db 



I I I : : I I : I : I I I I : I I I I I I I I I I I I I I I I I I I : I I I I I II I I I I I I I I I : I I III 
180 VADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELA 239 



Qy 241 RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

I I : I I I : : I I I I I I I I I II I I I I I I I : : I I I : I I I I I III I I I : I I I I I I I I I I I I I 
Db 240 RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 

Qy 301 F YMD LT S VDT Q S RE RE I ET YKRVQML EC AFKE SDIYHKILENI ERAR YL KT L PMVP FKT K 360 

I I I I I I I I I I I I : I I II I I I I I I I : I I : I : I I II I : I I I I :: I I I I I I I I I I I I 
Db 300 FYMDLT S VDTQS KEREI ETS KRVQMI ES AYKKSAI CHKTLKN I ERMKHLKTLPMVP FKTK 359 

Qy 361 DPP GMFGKL GVL LRRVT RNLMRN KQAVI MRLVQNL I MGL FL I F YLL RVQNNT L KGAVQD R 420 

I I I : I I I I I I I I I I I I I I : I I I III I I : I I I I I I I I I : I : : I I I : : I I I I I : II I 
Db 360 DSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDR 419 

Qy 421 VGL L YQLVGAT P YT GMLNAVN L F PML RAVS DQ E SQ DGL YH KWQML LAYVLHVL P F S VI AT 480 

I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I II I I I I I I : I I I I I I I I I I I : I I 

Db 42 0 VGL L YQ FVGAT P YT GMLNAVN L F P VL RAVS DQ E S Q D GL YQ KWQMMLAYALHVL P F S WAT 479 

Qy 481 VI FS S VCYWTLGLYPEVARFGYFSAALLAPHL I GEFLTLVLLGI VQNPNI WS I VALL SI 540 

: I I I I I M I M II : I I I I I M I M I I I I I I I I I M I I I M I I I I I I I I I I I I I : II I I I I 
Db 480 MI FSSVCYWTLGLHPEVARFGYFSAALLAPHLI GEFLTLVLLGI VQNPNI VNSVVALLS I 539 

Qy 541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 

: I : I : I I I I : I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I II I : : I I 
Db 540 AGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPM 599 

Qy 601 CAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVI FKVRDYLISR 652 

II I I I : I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I : I I : I I : I I I I 

Db 600 CAFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 



RESULT 6 

US-10-104-047-2795 

; Sequence 2795, Application US/10104047 
; Publication No. US20030236392A1 
; GENERAL INFORMATION: 

; APPLICANT: HELIX RESEARCH INSTITUTE 

; TITLE OF INVENTION: No. US20030236392Alel full length cDNA 
; FILE REFERENCE: H1-A0105 

; CURRENT APPLICATION NUMBER: US/10/104,047 

; CURRENT FILING DATE: 2002-03-25 

; PRIOR APPLICATION NUMBER: 

; PRIOR FILING DATE: 

; NUMBER OF SEQ ID NOS : 4096 

; SOFTWARE: Patentln Ver. 2.1 

; SEQ ID NO 2795 

LENGTH: 256 

TYPE: PRT 
; ORGANISM: Homo sapiens 
US-10-104-047-2795 

Query Match 34.9%; Score 1177; DB 15; Length 256; 

Best Local Similarity 85.5%; Pred. No. 3.5e-106; 

Matches 219; Conservative 23; Mismatches 14; Indels 0; Gaps 0; 



Qy 397 MGLFLIFYLLRVQNNTLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQD 456 



M I I I • I * • I ! I • • I MM*MMIMM I II II II II II II II II • II II II 

Db 1 MGL FL L F FVLRVRSNVLKGAI QD RVG L L YQ FVGAT P YT GMLNAW L F P VL RAVS DQ E S Q D 60 

Qy 457 GLYHKWQMLIAWLHVliPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLlGEF 516 

III lllhlll I I I I I I I I : I I : I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I 
Db 61 GLYQKWQMMLAYALHVXPFSW^ 120 

Qy 517 LTLVLLGIVQNPNIWSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEIL 576 

I I I II I I I I I I I I I I I I : I I I II I : I : I : I I I I : I I I I I I I I I I I : I I I I I I I I II I 
Db 121 LTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEIL 180 

Qy 577 WNEFYGLNFTCGGSNTSMLNHPMCAITQGVQFIEKTCPGATSRFTANFLILYGFIPALV 636 

I I I I I I I I I I I I I II I : : I I I I I II : I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 WNEFYGLNFTCGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALV 240 

Qy 637 ILGIVIFKVRDYLISR 652 

I I I I I : I I : I I : I M I 
Db 241 ILGIWFKIRDHLISR 256 



RESULT 7 

US-09-989-981A-4 

Sequence 4, Application US/09989981A 
Publication No. US20030049730A1 
GENERAL INFORMATION: 
APPLICANT: Hobbs, Helen H. 
APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 018781-007320US 
CURRENT APPLICATION NUMBER: US/09/98 9, 98 1A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS: 13 
SOFTWARE: Patent In Ver. 2.1 
SEQ ID NO 4 
LENGTH: 672 
TYPE: PRT 

ORGANISM: Mus mus cuius 
FEATURE : 

OTHER INFORMATION: mouse ABCG8 (mABCG8 ) 
US-09-989-981A-4 



Query Match 20.8%; Score 701.5; DB 10; Length 672; 

Best Local Similarity 29.1%; Pred. No. 4.2e-59; 

Matches 194; Conservative 131; Mismatches 245; Indels 97; Gaps 19; 

Qy 27 QGSVTGTEARHSLGVLHVSYS VSNRVGPW WNIKS 60 

II: :|: :| |: :|| ::::| II II 

Db 24 QDSLFSSESDNS LYFTYSGQSNTLEVRDLTYQVDIASQV-PWFEQLAQFKIPWRSHS 79 



Qy 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

I : I : : : I : I I I : : I : I I I I I : : I I I I : I I | : : : : | I 

Db 80 SQDSCELGI-RNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQP 138 

Qy 121 LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRS-SADFYNKKVEAVMTELSLS 17 9 

: I : : I I I I : I I I I I I I : I : I I : I :|:|| |: II I 

Db 139 STPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLR 198 

Qy 180 HVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVM^^ 239 

I : : I : I : I I I I I I I I I I I I : I : : : I I I I I : I I I I I : : I I : I 

Db 199 QCANTRVGNTWRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRL 258 

Qy 240 ARRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPF 299 

I : : I : I : : : : I M I I : : I : I I : : : I I : : I : : I : : | : I : | I I : I I I 
Db 259 AKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPA 318 

Qy 300 DFYMDLTSVDTQSREREIETYKRVQMLECAFKE SDIYHKI-LENIERARYLKTLP 353 

I I I : I I I I : I : I : I I I : I : : I I II I I : : : : :| 

Db 319 DFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLT 378 

Qy 354 MVPFKTKDP PGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYL 405 

: I : I III: I : I I I I : : : : : : | I : | 

Db 379 L TQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGF-- 432 

Qy 406 LRVQNNTLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQML 4 65 

I : : : I I I : : I : : I : I : I : : I : I I II 

Db 433 LYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYF 4 92 

Qy 466 LAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFL TL 519 

I : I I I : I : : II II I I I I : : I I : 

Db 493 FAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELF LL — HFLLVWLWFCCRTM 544 

Qy 520 VLLGIVQNPNI-VNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILW 578 

I I , ::| : : : I I : i : : | : :| ::| |: 

Db 545 ALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQ 604 

Qy 579 NEFYGL NFTCGGSNTSML NHPMCA 1 T QGVQ F I EKT C P GAT S RFT 622 

: I I III : I : : I I : I I I : : 
Db 605 IQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISY 651 

Qy 623 ANFLILY 629 

I I I I 

Db 652 -GFLFLY 657 



RESULT 8 
US-09-961-086-1 

; Sequence 1, Application US/09961086 
; Publication No. US20030036645A1 
; GENERAL INFORMATION: 

; APPLICANT: UNIVERSITY OF MARYLAND, BALTIMORE 
; APPLICANT: ROSS, Douglas D. 
; APPLICANT: DOYLE, L. Austin 
; APPLICANT: ABRUZZO, Lynne 

; TITLE OF INVENTION: BREAST CANCER RESISTANCE PROTEIN (BCRP) AND THE DNA 
; TITLE OF INVENTION: WHICH ENCODES IT 
; FILE REFERENCE: EP 1937 6- 019 



; CURRENT APPLICATION NUMBER: US/09/961,086 

; CURRENT FILING DATE: 2001-09-21 

; PRIOR APPLICATION NUMBER: US 60/073,763 

; PRIOR FILING DATE: 1998-02-05 

; PRIOR APPLICATION NUMBER: PCT/US99/ 02577 

; PRIOR FILING DATE: 1999-02-05 

; NUMBER OF SEQ ID NOS : 7 

; SOFTWARE: Patentln Ver. 2.1 

; SEQ ID NO 1 

LENGTH: 655 

TYPE: PRT 
; ORGANISM: Homo sapiens 
US-09-961-086-1 

Query Match 20.6%; Score 693.5; DB 10; Length 655; 

Best Local Similarity 29.0%; Pred. No. 2.4e-58; 

Matches 181; Conservative 142; Mismatches 246; Inclels 55; Gaps 16; 

Qy 25 LEQGSVTGTEARHS LGVLHVSYSVSNRVGPWWNIKSCQQKWDRQILKDV 73 

: I I : I I I I : : I I : I I : : : : : I I : : 

Db 12 VSQGNTNGFPATASNDLKAFTEGAVLSFHNICYRVKLKSG — — FLPCRKPVEKEILSNI 67 

Qy 74 SLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYV 133 

: :: I : I I I : I I I : : I I I : : I :| I |:| :|| I |: II 
Db 68 NGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDVLINGAP-RPANFKCNSGYV 124 

Qy 134 LQS DVFL S S LTVRET LRYTAMLALCRS S ADF- YNKKVEAVMT EL S L SHVADQMI GS YNFG 192 

:| II : :IMII |:::| II : : |::: hill III :|: 
Db 125 VQDDVVMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKVGTQFIR 184 

Qy 193 GISSGERRRVSIAAQLLQDPKVl^LDEPTTGLDCMTANQIVLLLAEIARRDRIVIVTIHQ 2 52 

I : I I I I : I I I : I : I I : : II I I I I I I I I I I : : I I I : : : : I : I : I I I 

Db 185 GVSGGERKRTS I GMELITDPSI LFLDEPTTGLDS STANAVLLLLKRMSKQGRTI I FS IHQ 244 

Qy 253 PRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQ- 311 

II : I : I I : : I I hi I : I I I : I : I I I : : I I I I : : I : : I : 

Db 245 PRYSIFKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTA 304 

Qy 312 - SREREI ETYKRVQMLECAFKESDI YHKI LENIERARYLKT 351 

: I I I : I : : : I : : : : I : I I : : : 

Db 305 VALNRE-EDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITV 363 

Qy 352 LPMVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNN 411 

: : I : I : : I : I I : I I I I : : : : : : M : : : I : 

Db 364 FKEISYTT S FCHQLRWVSKRS FKNLLGNPQAS IAQI I VTWLGLVI GAI YFGLKND 419 

Qy 412 TLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVL- 470 

: : | : | | : I : I : : : : I I II : : : I II I : I 

Db 420 ST— GIQNRAGVLFFLTTNQCFSS-VSAVELFWEKKLFIHEYISGYYRVSSYFLGKLLS 4 76 

Qy 471 HVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNI 530 

:|| ::: ::||: : I: III I: I |: :: : : I I :: 
Db 477 DLLPMTMLPSIIFTCIVYFMLGLKPKADAFFVMMFTLM MVAYS AS SMALAI AAGQSV 533 

Qy 531 VNSIVALLSIS — GLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTC 588 

I : I : : I : : I I I : I : : I III: : I I I I I I I I I 

Db 534 VSVATLLMTICFVFMMIFSGLLVNLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNF-C 592 



Qy 58 9 GGSNTSMLNHPMCAITQGVQFIEK 612 

I I : I I I ::: I 
Db 593 P GLNAT GNN P CN YAT CT GEE YLVK 616 



RESULT 9 

US-10-405-806-13 

; Sequence 13, Application US/10405806 

; Publication No. US20030232362A1 

; GENERAL INFORMATION: 

; APPLICANT: KOMATANI, HIDEYA 

; APPLICANT: KARA, YOSHIKAZU 

; APPLICANT: KOTANI, HIDEHITO 

; APPLICANT: NAKAGAWA, RINAKO 

; TITLE OF INVENTION: DRUG RESISTANT GENE AND USE THEREOF 

; FILE REFERENCE: 234985US0CONT 

; CURRENT APPLICATION NUMBER: US/10/405, 806 

; CURRENT FILING DATE: 2003-04-03 

; PRIOR APPLICATION NUMBER: PCT/ JP01/08112 

; PRIOR FILING DATE: 2001-09-18 

PRIOR APPLICATION NUMBER: JP2000-303441 
; PRIOR FILING DATE: 2000-10-03 
; NUMBER OF SEQ ID NOS : 17 
; SOFTWARE: Patentln version 3.2 
; SEQ ID NO 13 
; LENGTH: 655 
; TYPE: PRT 

; ORGANISM: Artificial Sequence 
FEATURE: 

OTHER INFORMATION: ABCG2 482Tmutant sequence 
US-10-405-806-13 

Query Match 20.6%; Score 693.5; DB 15; Length 655; 

Best Local Similarity 29.0%; Pred. No. 2.4e-58; 

Matches 181; Conservative 142; Mismatches 246; Indels 55; Gaps 16; 

Qy 25 LEQGSVTGTEARHS LGVLHVS YSVSNRVGPWWNI KSCQQKWDRQI LKDV 73 

: I I : I I I I : : I I : I I : : : : : I I : : 

Db 12 VSQGNTNGFPATASNDLKAFTEGAVLSFHNICYRVKLKSG FLPCRKPVEKEILSNI 67 

Qy 74 SLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYV 133 

: : : I : I I I : I M : : I I I : : I : I I I : I : I I I I : II 
Db 68 NGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDVLINGAP-RPANFKCNSGYV 124 

Qy 134 LQSDVFLSSLTVRETLRYT7\MI^CRSSADF-YNKKVEAVMTELSLSHVADQMIGSYNFG 192 

: I I I : : I I I I I I : : : I I I : : I : : : hill III : I : 
Db 125 VQDDVVTyiGTLTWENLQFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKVGTQFIR 184 

Qy 193 GI S S GERRRVS IAAQLLQDPKVMMLDEPTTGLDCMTANQI VLLLAELARRDRI VI VT I HQ 2 52 

I : I I I I : I I I : I : I I :: I II I I I I I I I I I : : I II : : : : I : I : I I I 

Db 185 GVSGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQ 244 

Qy 253 PRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQ- 311 

II : | : | | : : I I I : I I : I I I : I : I I I : : I I I I : : I : : I : 

Db 245 PRYSIFKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTA 304 



Qy 312 -SREREIETYKRVQMLECAFKESDIYHKI LENIERARYLKT 351 

: M | : I : : : I : : : : I : I I : : : 

Db 305 VALNRE-EDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITV 363 

Qy 352 LPMVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNN 411 

: : I :| : :| :||: I II I ::: :::|| : ::h 

Db 364 FKEISYTT S FCHQLRWVS KRS FKNLLGNPQAS I AQI I VT WLGLVI GAI YFGLKND 419 

Qy 412 TLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVL- 470 

: : I : I I : I : I : : : : | | | | : : : I II I : I 

Db 420 ST — GIQNRAGVLFFLTTNQCFSS-VSAVELFVVEKKLFIHEYISGYYRVSSYFLGKLLS 476 

Qy 471 HVLPFSVIATVI FSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNI 530 

: M : : : : : I I : : I : I I I I : I I : : : : : I I : : 

Db 477 DLLPMTMLPSIIFTCIVYFMLGLKPKADAFFVMMFTLM MVAYSAS SMALAIAAGQSV 533 

Qy 531 VNSIVALLSIS— GLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTC 588 

| : | : : | : : I I I : I : : I III: : I I I I I I I I I 

Db 534 VSVATLLMTICFVFMMIFSGLLWLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNF-C 592 

Qy 589 GGSNTSMLNHPMCAITQGVQFIEK 612 

I I : I I I ::: I 

Db 593 PGLNATGNNPCNYATCTGEEYLVK 616 



RESULT 10 
US-09-981-353-35 

; Sequence 35, Application US/09981353 

; Patent No. US20020160382A1 

; GENERAL INFORMATION: 

; APPLICANT: Lasek, Amy W. 

; APPLICANT: Jones, David A. 

; TITLE OF INVENTION: GENES EXPRESSED IN COLON CANCER 
; FILE REFERENCE: PA-0038 US 

; CURRENT APPLICATION NUMBER: US/09/981, 353 
; CURRENT FILING DATE: 2001-10-11 
; NUMBER OF SEQ ID NOS : 194 

SOFTWARE: PERL Program 
; SEQ ID NO 35 

LENGTH: 655 

TYPE: PRT 
; ORGANISM: Homo sapiens 
; FEATURE : 

NAME/KEY: misc_feature 
; OTHER INFORMATION: Incyte ID No. US20020160382A1 5517972CD1 
US-09-981-353-35 

Query Match 20.5%; Score 691.5; DB 9; Length 655; 

Best Local Similarity 29.0%; Pred. No. 3.8e-58; 

Matches 181; Conservative 141; Mismatches 247; Indels 55; Gaps 16; 

Qy 25 LEQGSVTGTEARHS LGVLHVSYSVSNRVGPWWNIKSCQQKWDRQILKDV 73 

: ||: I I I | :: | | : | |:: :::|| :: 

Db 12 VSQGNTNGFPATASNDLKAFTEGAVLSFHNICYRVKLKSG FLPCRKPVEKEILSNI 67 

Qy 74 SLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYV 133 

: : : I : I I I : I I I : : I I I : : I : I I I : I : II I I : II 



Dfo 



68 NGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDVLINGAP-RPANFKCNSGYV 124 



Qy 134 LQSDVFLSSLTVRETLRYTAMLALCRSSADF-YNKKVEAVMTELSLSHVADQMIGSYNFG 192 

:| || : :||||| |:::| I I : : |::: hill III :|: 
Db 125 VQDDWMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKVGTQFIR 184 

Qy 193 GISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVXLLAELARRDRIV 252 

| : | | | | : | I I : I : I I :: I II I I I I I I I I I : : I I I : : : : I : I : I I I 
Db 185 GVSGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQ 244 

Qy 253 PRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQ- 311 

|| : | : M : : I I '|:| I :| Ihl : II I : : I I I I : : I : : I : 
Db 245 PRYSIFKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTA 304 

Qy 312 -SREREIETYKRVQMLECAFKESDIYHKI LENIERARYLKT 351 

: | I I : I ::: I : :: : I : I I : : : 

Db 305 VALNRE-EDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITV 363 

Qy 352 LPMVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNN 411 

: : I :| : :| :||: I II I ::: :::|l : ::|: 

Db 364 FKEISYTT SFCHQLRWVSKRS FKNLLGNPQAS IAQI IVTWLGLVI GAI YFGLKND 419 

Qy 412 T L KGAVQ D RVGL L YQLVGAT P YT GMLNAVN L F PMLRAVS DQE S Q D GL YH KWQML LAYVL - 470 

: : | : I I : I : I :: ::|| II : : : I I I I :l 

Dfo 420 ST— GIQNRAGVLFFLTTNQCFSS-VSAVELFWEKKLFIHEYISGYYRVSSYFLGKLLS 476 

Qy 471 HVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNI 530 

: I I : : : : | | : : I : I I I I : I I : . : : : : I I : : 
Db 477 DLLPMRMLPSI I FTCIVYFMLGLKPKADAFFVMMFTLM MVAYSAS SMALAI AAGQS V 533 

Qy 531 VNSIVALLSIS— GLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTC 588 

| : | : : | : : | I I : I : : I III: : I I I M I II I 

Db 534 VSVATLLMTICFVFMMIFSGLLVNLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNF-C 592 

Qy 589 GGSNTSMLNHPMCAITQGVQFIEK 612 

I I : I I I ::: I 

Db 593 P GLNAT GNN PCN YAT CT GEE YLVK 616 



RESULT 11 
US-10-120-687-61 

; Sequence 61, Application US/10120687 
; Publication No. US20030082155A1 
; GENERAL INFORMATION: 

; APPLICANT: Massachusetts General Hospital 

TITLE OF INVENTION: Stem Cells of the Islets of Langerhans and Their Use in 
Treating Diabetes 

; TITLE OF INVENTION: Mellitus 
; FILE REFERENCE: 3284/1235B 

; CURRENT APPLICATION NUMBER: US/10/120,687 

; CURRENT FILING DATE: 2002-04-11 

; PRIOR APPLICATION NUMBER: US60/169082 

PRIOR FILING DATE: 1999-12-06 
; PRIOR APPLICATION NUMBER: US 09/963,875 
; PRIOR FILING DATE: 2001-09-25 
; PRIOR APPLICATION NUMBER: US 60/215109 
; PRIOR FILING DATE: 2000-06-28 



; PRIOR APPLICATION NUMBER: US 60/238880 

; PRIOR FILING DATE: 2000-10-06 

; PRIOR APPLICATION NUMBER: US 09/731261 

; PRIOR FILING DATE: 2000-12-06 

; NUMBER OF SEQ ID NOS : 61 

SOFTWARE: Patentln version 3.1 
; SEQ ID NO 61 

LENGTH: 655 
; TYPE: PRT 

; ORGANISM: Homo sapiens 
US-10-120-687-61 



Query Match 20.5%; Score 691.5; DB 14; Length 655; 

Best Local Similarity 29.0%; Pred. No. 3.8e-58; 

Matches 181; Conservative 141; Mismatches 247; Indels 55; Gaps 16; 

Qy 25 LEQGSVTGTEARHS LGVXHVSYSVSNRVGPWWNIKSCQQKWDRQILKDV 73 

: ||: | I I I :: I I : I |:: :::ll :: 

Db 12 VSQGNTNGFPATASNDLKAFTEGAVLSFHNICYRVKLKSG— — FLPCRKPVEKEILSNI 67 

Qy 74 SLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYV 133 

: : : I : I I I : I I I : : I I I : : I : I I i : I : I I I I : II 
Db 68 NGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDVLINGAP-RPANFKCNSGYV 124 

Qy 134 LQS DVFL S S LT VRET LRYTAMLALCRS S ADF- YN KKVEAVMT EL S LS HVADQMI GS YN FG 192 

: I I I : : I I I I I I ::: I I I : : |::: hill Ml :h 
Db 125 VQDDVVMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKVGTQFIR 184 

Qy 193 GISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVXLLAELARRDRIVIW 252 

|: I I I I : I I I : h I I :: I I I I I I I I I I I I : : I II : : : : I : I : I I I 
Db 185 GVSGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQ 244 

Qy 253 PRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQ- 311 

|| : |: I I : : I I I : I I : I I h I : I I I :: I I I I :: h : h 
Db . 245 PRYSIFKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTA 304 

Qy 312 -SREREIETYKRVQMLECAFKESDIYHKI LENIERARYLKT 351 

: | | | : I : : : I : : : : I : I h : : 

Db 305 VALNRE-EDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITV 363 

Qy 352 LPMVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNN 411 

: : I : U : : I : I I : I I I I :::::: I I : : : h 

Db 364 FKEISYTT S FCHQLRWVSKRS FKNLLGNPQAS I AQI I VTWLGLVI GAI YFGLKND 419 

Qy 412 TLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVX- 47 0 

: : |: I I : I : I : : : : I I I I : : : I II I : I 

Db 420 ST— GIQNRAGVLFFLTTNQCFSS-VSAVELFWEKKLFIHEYISGYYRVSSYFLGKLLS 476 

Qy 471 HVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNI 530 

: I I : : : : I I : : h I I I h I h : ' : : I I : : 

Db 477 DLLPMRMLPSI I FTCIVYFMLGLKPKADAFFVMMFTLM MVAYSASSMALAIAAGQSV 533 

Qy 531 VNSIVALLSIS— GLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTC 588 

| : | : : | : : I I I : h : I I I h : I I II I I I I I 

Db 534 VSVATLmTICFVFMMIFSGLLVNLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNF-C 592 



Qy 



589 GGSNTSMLNHPMCAITQGVQFIEK 612 



I I : I I I ::: I 

Db 593 PGLNATGNNPCNYATCTGEEYLVK 616 



RESULT 12 
US-10-405-806-2 

; Sequence 2, Application US/10405806 

; Publication No. US20030232362A1 

; GENERAL INFORMATION : 

; APPLICANT: KOMATANI, HIDEYA 

; APPLICANT: HARA, YOSHIKAZU 

; APPLICANT: KOTANI, HIDEHITO 

; APPLICANT: NAKAGAWA, RINAKO 

; TITLE OF INVENTION: DRUG RESISTANT GENE AND USE THEREOF 

; FILE REFERENCE: 234985US0CONT 

; CURRENT APPLICATION NUMBER: US/10/405, 806 

; CURRENT FILING DATE: 2003-04-03 

; PRIOR APPLICATION NUMBER: PCT/ JP01/08112 

; PRIOR FILING DATE: 2001-09-18 

; PRIOR APPLICATION NUMBER: JP2000-303441 

; PRIOR FILING DATE: 2000-10-03 

; NUMBER OF SEQ ID NOS : 17 

; SOFTWARE: Patentln version 3.2 

; SEQ ID NO 2 

LENGTH: 655 
; TYPE: PRT 

; ORGANISM: Homo sapiens 
US-10-405-806-2 



Query Match 20.5%; Score 691.5; DB 15; Length 655; 

Best Local Similarity 29.0%; Pred. No. 3.8e-58; 

Matches 181; Conservative 141; Mismatches 247; Indels 55; Gaps 



Qy 


25 


LEQGSVTGTEARHS LGVLHVSYSVSNRVGPWWNIKSCQQKWDRQILKDV 

: | | : | I 1 1 : : 1 1 : 1 | : : : : : | | : : 
VSQGNTNGFPATASNDLKAFTEGAVLS FHNI CYRVKLKSG FLPCRKPVEKEILSNI 


73 


Db 


12 


67 


Qy 


74 


SLYI ESGQIMCI LGS SGSGKTTLLDAI SGRLRRTGTLEGEVFVNGCELRRDQFQDCFS YV 

: :: | : I I I : I 1 1 : : 1 1 1 : : 1 :| 1 |:| :|l 1 |: II 
NGIMKPG-LNAILGPTGGGKSSLLDVIAARKDPSG-LSGDVLINGAP-RPANFKCNSGYV 


133 


Db 


68 


124 


Qy 


134 


LQ S D VFL S S LT VRET L RYT AMLALC RS SAD F- YN KKVEAVMT E L S L S HVADQMI G S YN FG 
:| || : :||||| |:::| I I : : 1::: hill Ml :|: 


192 


Db 


125 


VQDDVVMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKVGTQFIR 


184 


Qy 


193 


GISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVXLLAELARRDRIV 

| : I I I h 1 1 1 : h 1 1 :: II 1 1 1 1 1 1 1 1 1 1 : : 1 1 1 : : : : 1 : 1 : 1 1 1 
GVSGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQ 


252 


Db 


185 


244 


Qy 


253 


PRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQ- 

M :|: || : :| 1 hi 1 :| Ihl : II 1 ::|| ll::h : h 
PRYSIFKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTA 


311 


Db 


245 


304 


Qy 


312 


-SREREIETYKRVQMLECAFKESDIYHKI LENIERARYLKT 


351 


Db 


305 


: | | | : I ::: 1 : :: : h 1 h : : 
VALNRE-EDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITV 


363 



Qy 352 LPMVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNN 411 

: : I : I : : I : I I : I I I I : : : : : : I I : : : I : 

Db 364 FKEISYTT SFCHQLRWVSKRSFKNLLGNPQASIAQIIVTWLGLVIGAIYFGLKND 419 

Qy 412 T L KGAVQD RVGL L YQLVGAT P YT GMLNAVN LFPML RAVS DQE S Q DG L YH KWQML LAYVL - 470 

: : I : I I : I : I : : : : I I I I : : : I I I I : I 

Db 420 ST — GIQNRAGVLFFLTTNQCFSS-VSAVELFWEKKLFIHEYISGYYRVSSYFLGKLLS 476 

Qy 471 HVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNI 530 

: I I :: :: I I : : I : I I I I : I I: :: : : I I :: 
Db 477 DLLPMRMLPS I 1 FTCI VYFMLGLKPKADAFFVMMFTLM MVAYS AS SMALAIAAGQSV 533 

Qy 531 VNSIVALLSIS — GLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTC 588 

| : | : : | : : I I I : I : : I III: : I I I I I I I I I 

Db 534 VSVATLLMTICFVFMMIFSGLLWLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNF-C 592 

Qy 589 GGSNT SMLNHPMCAI TQGVQFI EK 612 

I I : I | | ::: I 

Db 593 PGLNATGNNPCNYATCTGEEYLVK 616 



RESULT 13 
US-09-866-866A-10 

; Sequence 10, Application US/09866866A 

; Patent No. US20020102244A1 

; GENERAL INFORMATION: 

; APPLICANT: Sorrentino, Brian 

; APPLICANT: Schuetz, John 

; TITLE OF INVENTION: A Method of Identifying and/or Isolating Stem Cells 

; FILE REFERENCE: 1340-1-021CIP2 

; CURRENT APPLICATION NUMBER: US/09/866, 866A 

; CURRENT FILING DATE: 2001-08-30 

; PRIOR APPLICATION NUMBER: 09/584,586 

; PRIOR FILING DATE: 2000-05-31 

; PRIOR APPLICATION NUMBER: PCT/US99/11825 

; PRIOR FILING DATE: 1999-05-27 

; PRIOR APPLICATION NUMBER: 60/086,988 

; PRIOR FILING DATE: 1998-05-28 

; NUMBER OF SEQ ID NOS : 27 

; SOFTWARE: Patentln version 3.0 

; SEQ ID NO 10 

; LENGTH: 655 

; TYPE: PRT 

; ORGANISM: Homo sapien 

US-09-866-866A-10 

Query Match 20.5%; Score 689.5; DB 9; Length 655; 

Best Local Similarity 29.0%; Pred. No. 6e-58; 

Matches 181; Conservative 141; Mismatches 247; Indels 55; Gaps 16; 

Qy 25 LEQGSVTGTEARHS LGVLHVSYSVSNRVGPWWNIKSCQQKWDRQILKDV 73 

: II: I I I | :: | | : | |:: :::|| :: 

Db 12 VSQGNTNGFPATVSNDLKAFTEGAVLSFHNICYRVKLKSG FLPCRKPVEKEILSNI 67 

Qy 74 SLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYV 133 

: : : I : I I I : I I I : : I I I : : I : I I I : I : I I I I : II 
Db 68 NGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDVLINGAP-RPANFKCNSGYV 12 4 



Qy 134 LQSDVFLSSLTVRETLRYTAMLALC^ 192 

:| II : :||||| |:::| II : : I::: |: II I III :h 
Db 125 VQDDVVMGTLTVRENLQFSi\7UiRLATTMTNHEKNERINRVIEELGLDKVADSKVGTQFIR 184 

Qy 193 GISSGERRRVSIAAQLLQDPKVmLDEPTTGLDCMTANQIVLLIAELARRDRIVIVTIHQ 252 

|:| llhl II : I : I I :: I I I I I I I II III ::IM :::: I : I : H I 
Db 185 GVSGGERKRTSIGMELITDPSILSLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQ 244 

Qy 253 PRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQ- 311 

II :|: || : :| I I : I I :| Ihl : II I : : I I I I : : I : : I : 
Db 245 PRYSIFKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTA 304 

Qy 312 -SREREIETYKRVQMLECAFKESDIYHKI LENI ERARYLKT 351 

: I I I :| :::| : :: : I: I 1= = = 

Db 305 VALNRE-EDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITV 363 

Qy 352 LPMVPFKTKDPPGMFGKLGVLLRRWRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNN 411 

: : | : I : : I : I I : I I I I : : : : : : I I : : : I : 

Db 364 FKEISYTT S FCHQLRWVS KRS FKNLLGNPQAS I AQI I VTWLGLVI GAI YFGLKND 419 

Qy 412 TLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVL- 47 0 

: : | : | | : | : | : : : : I I I I : : : I II I : I 

Db 420 ST — GIQNRAGVLFFLTTNQCFSS-VSAVELFVVEKKLFIHEYISGYYRVSSYFLGKLLS 476 

Qy 471 HVLPFSVIATVI FS SVCYWTLGLYPEVARFGYFSAALLAPHLI GEFLTLVLLGIVQNPNI 530 

: I I : : : : I I : : I : I I I I : I I : : : : : I I : : 

Db 477 DLLPMRMLPSIIFTCIVYFMLGLKPKADAFFVMMFTLM MVAYSAS SMALAIAAGQSV 533 

Qy 531 VNSIVALLSIS — GLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTC 588 

I : I : : I : : I I I : I : : I III: : I I I I I I I I I 

,Db 534 VSVATLLMTI CFVFMMI FSGLLVNLTTI ASWLSWLQYFS I PRYGFTALQHNEFLGQNF-C 592 

Qy 589 GGSNTSMLNHPMCAITQGVQFIEK 612 

I I : I I I ::: I 

Db 593 PGLNATGNNPCNYATCTGEEYLVK 616 



RESULT 14 
US-10-090-455-5 

; Sequence 5, Application US/10090455 

; Publication No. US20030027259A1 

; GENERAL INFORMATION: 

; APPLICANT: Chen, Hongyun 

; APPLICANT: Le Bihan, Stephane 

; TITLE OF INVENTION: NOVEL ABCG4 TRANSPORTER AND USES THEREOF 
; FILE REFERENCE: 100103.406 

; CURRENT APPLICATION NUMBER: US/ 10/ 090, 455 
; CURRENT FILING DATE: 2002-03-01 
; NUMBER OF SEQ ID NOS : 17 

; SOFTWARE: FastSEQ for Windows Version 4.0 
; SEQ ID NO 5 
; LENGTH: 655 

TYPE: PRT 
; ORGANISM: Homo sapiens 
US-10-090-455-5 



Query Match 20.5%; Score 689.5; DB 14; Length 655; 

Best Local Similarity 29.0%; Pred. No. 6e-58; 

Matches 181; Conservative 141; Mismatches 247; Indels 55; Gaps 16; 

Qy 25 LEQGSVTGTEARHS LGVLHVSYSVSNRVGPWWNIKSCQQKWDRQILKDV 73 

: I |: I I I I :: I I : I |:: :::| I :: 

Db 12 VSQGNTNGFPATVSNDLKAFTEGAVLSFHNICYRVKLKSG FLPCRKPVEKEILSNI 67 

Qy 74 SLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYV 133 

: :: I : I I I : I I I : : I I I : : I : I I I : I : I I I I : II 
Db 68 NGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDVLINGAP-RPANFKCNSGYV 124 

Qy 134 LQSDVFLSSLTVRETLRYTAMIALCRSSADF-YNKKV^ 192 

: I I I : : I I I I I I : : : I I I : : I : : : hill III : h 
Db 125 VQDDWMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIEELGLDKVADSKVGTQFIR 184 

Qy 193 GISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELARRDRIVIVTIHQ 252 

I : I I I I : I I I : I : II : : I I I I I I I I I I I I : : I I I : : : : I : I : I I I 

Db 185 GVSGGERKRTSIGMELITDPSILSLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQ 244 

Qy 253 PRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQ- 311 

II : I : I I : : I I I : I I : I I h I : I I I : : I I I h : h : h 

Db 245 PRYSIFKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTA 304 

Qy 312 -SREREIETYKRVQMLECAFKESDIYHKI LENIERARYLKT 351 

: I I I : I : : : I : : : : I : I h : : 

Db 305 VALNRE-EDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITV 363 

Qy 352 LPMVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNN 411 

: : I : I : : I : I I : I I I I : : : —Ml : : : h 

Db 364 FKEISYTT S FCHQLRWVS KRS FKNLLGNPQAS I AQI I VTWLGLVI GAI YFGLKND 419 

Qy 412 TLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVL- 470 

: : I: I h I: I :: :: I I I I : : : I II I : I 

Db 420 ST — GIQNRAGVLFFLTTNQCFSS-VSAVELHWEKKLFIHEYISGYYRVSSYFLGKLLS 476 

Qy 471 HVLPFSVIATVI FSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNI 530 

: I I : : : : | |: : h I I I h I h : : : : I I : : 

Db 477 DLLPMRMLPSI I FTCIVYFMLGLKPKADAFFVMMFTLM MVAYSAS SMALAIAAGQSV 533 

Qy 531 VNSIVALLSIS — GLLI GS GFI RNI QEMP I PLKI LG YFT FQKYCCEI LWNEFYGLNFTC 588 

I : I: : I : : I I I : h : I I I h : I I I I I I I I I 
Db 534 VS VATLLMT I CFVFMMI FS GLLVNLTT I ASWLSWLQ YFS I PRYGFTALQHNEFLGQNF- C 592 

Qy 589 GGSNTSMLNHPMCAITQGVQFIEK 612 

I I : I I I ::: I 
Db 593 PGLNATGNNPCNYATCTGEEYLVK 616 



RESULT 15 
US-09-989-981A-8 

; Sequence 8, Application US/09989981A 

; Publication No. US20030049730A1 

; GENERAL INFORMATION: 

; APPLICANT: Hobbs, Helen H. 

; APPLICANT: Shan, Bei 

; APPLICANT: Barnes, Robert 



; APPLICANT: Tian, Hui 
; APPLICANT: Tularik Inc. 

; APPLICANT: Board of Regents , The University of Texas System 

; TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 

; FILE REFERENCE: 018781-007320US 

; CURRENT APPLICATION NUMBER: US/09/989, 981A 

; CURRENT FILING DATE: 2002-07-23 

; PRIOR APPLICATION NUMBER: US 60/252,235 

; PRIOR FILING DATE: 2000-11-20 

; PRIOR APPLICATION NUMBER: US 60/253,645 

; PRIOR FILING DATE: 2000-11-28 

; NUMBER OF SEQ ID NOS : 13 

; SOFTWARE: Patentln Ver. 2.1 

; SEQ ID NO 8 

; LENGTH: 673 

TYPE: PRT 
; ORGANISM: Homo sapiens 

FEATURE: 

OTHER INFORMATION: human ABCG8 (hABCG8) 
US-09-989-981A-8 

Query Match 20.4%; Score 688.5; DB 10; Length 673; 

Best Local Similarity 28.1%; Pred. No. 7.8e-58; 

Matches 188; Conservative 125; Mismatches 233; Indels 123; Gaps 16; 

Qy 37 HSLGVLHVSYSV — SNRVGPW WNIKSCQQKWDRQILKDVSLYIESGQIMC 84 

: : I I : : I I : : : I I I I III : I : : : I : I I I : : 

Db 45 NTLEVRDLN YQVDLASQV- PWFEQLAQFKMPWT S P SCQNS CELGI -QNLS FKVRSGQMLA 102 

Qy 85 ILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYVLQSDVFLSSLT 144 

I : I I I I I :: I I I I : I I I : : : : I I : | : : | I : I : I I 

Db 103 1 1 GS SGCGRAS LLD VI TGRGHGGKI KS GQIWINGQPS S PQLVRKCVAHVRQHNQLLPNLT 162 

Qy 145 VRETLRYTAMLALCRS-SADFYNKKVEAVMTELSLSHVADQMIGSYNFGGISSGERRRVS 203 

I I I I I : I : I I : I : I : I I I : I I I II : I : I : I I I I II I I 

Db 163 VRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVS 222 

Qy 204 IAAQLLQDPKVMMLDEPTTGLDCMTANQIVXLLAELARRDRIVIVTIHQPRSELFQHFDK 263 

I I I I : I : : : I I I I I : I I I I I : : I I : I I : : I : I : : : : I I I I I : : I : I I 
Db 223 IGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDL 282 

Qy 264 IAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRV 323 



Db 283 VLLMTS GT P I YLGAAQHMVQ YFTAI GYPCPRYSNPADFYVDLT S I DRRSREQELATREKA 342 

Qy 324 QMLECAFKESDIYHKILENIERARYL KTLPM VPFKT 359 

III : I : I I I : : I I 

Db 343 QSLAALF LEKVRDLDDFLWKAETKDLDEDTCVESSVTPLDTNCLPSPT 390 

Qy 360 KDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQD 419 

III : |:|| ||: ::: : :| : : I I : ::: : I 

Db 391 K-MPGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGF — LYFGHGSIQLSFMD 447 

Qy 420 RVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIA 479 

II:: I : : I : : : I I : I : I I I I I : I II 

Db 448 TAALLFMIGALIPFNVILDVISKCYSERAMLYYELEDGLYTTGPYFFAKILGELPEHCAY 507 



Qy 480 TVI FS SVC YWTLGLYPEVARF GYFSAALLAPHLI GEFLTLVLL 522 

: I : II I I : I : I I I I : I : I 

Db 508 IIIYGMPTYWLANLRPGLQPFLLHFLLWLWFCCRIMALAAAALLPTFHMASFFSNAL- 566 

Qy 523 GIVQNPNIVNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEF- 581 

: : ||: |: : : :| ::| I |: :| 

Db 567 YNSFYLAGGFMINLSSLWTVPAWISKVSFLRWCFEGLMKIQFS 609 

Qy 582 YGL NFTCGGSNTSML NHPMCAITQGVQFIEKTCPGATSRFTANFLILY 629 

I : II I : I :: I : II : I I : 

Db 610 RRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAI YLIVI 648 

Qy 630 GFIPALVIL 638 

I : : I 

Db 649 GLSGGFMVL 657 



Search completed: February 27, 2004, 07:34:03 
Job time : 31.3006 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 
Run on: 



Title: 

Perfect score: 
Sequence : 



February 27, 2004, 06:40:43 ; Search time 36.1949 Seconds 

(without alignments) 
5683.620 Million cell updates/sec 

US-09-989-981A-2 
3369 

1 MGELPFLSPEGARGPHINRG PAL VI LGI VI FKVRDYLI S R 652 



Scoring table: BLOSUM62 

Gapop 10.0 , Gapext 0.5 



Searched: 



1017041 seqs, 315518202 residues 



Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



1017041 



Database : 



SPTREMBL_25:* 

1: sp_archea: + 

2: sp_bacteria : * 

3 : sp_f ungi : * 

4 : sp_human : * 

5: sp_invertebrate : * 

6 : sp_mammal : * 

7 : sp_mhc : * 

8: sp_organelle : * 

9: sp_phage:* 

10: sp_plant:* 

1 1 : sp_rodent : * 

12: sp_virus:* 

13: sp_vertebrate : * 

14: sp_unclassif ied: * 

15: sp_rvirus:* 

16: sp_bacteriap : * 

17: sp_archeap:* 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



SUMMARIES 



% 

Result Query 

No. Score Match Length DB ID 



Description 
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ALIGNMENTS 



RESULT 1 
Q7TSR8 

ID Q7TSR8 PRELIMINARY; PRT; 652 AA. 

AC Q7TSR8; 

DT 01-OCT-2003 (TrEMBLrel . 25, Created) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 



DE ATP-binciing cassette sub-family G member 5. 

GN ABCG5 . 

OS Mus mus cuius (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=l/LnJ; TTSSUE=Liver ; 

RA Wittenburg H., Lyons M.A., Li R. , Churchill G.A. , Carey M.C., 

RA Paigen B. ; 

RT "Primary Roles of FXR and ABCG5/ABCG8 in Cholesterol Gallstone 

RT Susceptibility: Evidence from a Cross of PERA/Ei and I/Ln Inbred 

RT Mice."; 

RL Submitted (DEC-2002) to the EMBL/GenBank/DDBJ databases. 

DR EMBL; AY195872; AAO45093.1; 

KW ATP-binding. 

SQ SEQUENCE 652 AA; 73236 MW; 0125FB617DE296B9 CRC64; 

Query Match 98.4%; Score 3315; DB 11; Length 652; 

Best Local Similarity 98.3%; Pred. No. 1.2e-238; 

Matches 641; Conservative 5; Mismatches 6; Indels 0; Gaps 0; 

Qy 1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I! I I I I I I I I I I I I I I I I I I I I I I 
Db 1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

Qy 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 111111:1111111 
Db 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRCTGTLEGDVFVNGCE 120 

Qy 121 LRRDQFQDCFSYVXQSDVFLSSLTVRETLRYTAMLALCRSSADFYNKKVEAVMTELSLSH 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I II I I I I I I I I I I I II I 
Db 121 LRRDQFQDCFSYVljQSDVFLSSLTVRETLRYTAMLALCRSSADFYNKKVEAVMTELSLSH 180 

Qy 181 VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELA 240 

I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
Db 181 VADQVIGSYNFGGISSGERRRVSIAAQLLQDPKWIMLDEPTTGLDCMTANQIVLLLAELA 240 

Qy 241 RRDRIVI VTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
Db 241 RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

Qy 301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 

I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
Db 301 FYMDLTSVDTQSREREIETYKRVQMLESAFKESDIYHKILENIERARYLKTLPTVPFKTK 360 

Qy 361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

Qy 421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I 
Db 421 VGLLYQFVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHALPFSIIAT 480 



Qy 



481 VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 
I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 



540 



Db 



481 VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 540 



Qy 541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 

I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I : I I I I II 
Db 541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGESNTTMLNHPM 600 

Qy 601 CAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVIFKVRDYLISR 652 

I I I I I I I : I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 601 CAITQGVEFI EKTCPGATSRFTANFLI LYGFI PALVI LGIVI FKVRDYLI SR 652 

RESULT 2 
Q8CIQ5 



ID Q8CIQ5 PRELIMINARY; PRT; 672 AA. 

AC Q8CIQ5; 

DT 01-MAR-2003 (TrEMBLrel. 23, Created) 

DT 01-MAR-2003 (TrEMBLrel. 23, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Sterolin 2. 

GN ABCG8 . 

OS Rattus norvegicus (Rat) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Rattus. 

OX NCBI_TaxID=10116; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Sprague-Dawley; 

RA Yu H., Lu K., Lee M. , Pandit B., Patel s.B.; 

RT "The rat Abcg5 and Abcg8 : characterization, chromosomal assignment and 

RT genetic variation in sitosterolemic rats."; 

RL Submitted (AUG-2002) to the EMBL/ GenBank/DDBJ databases. 

DR EMBL; AY145899; AAN64276.1; -. 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR PROSITE; PS00211; ABC_TRANSPORTER_l ; 1. 

DR PROSITE; PS50893; ABC_TRANSPORTER_2 ; 1. 

SQ SEQUENCE 672 AA; 75906 MW; 2FE0846E71BD9D47 CRC64; 



Query Match 21.3%; Score 717.5; DB 11; Length 672; 

Best Local Similarity 30.7%; Pred. No. 6.3e-45; 

Matches 196; Conservative 123; Mismatches 245; Indels 75; Gaps 19; 

Qy 23 SSLEQGSVTGTEARHS LGVLHVSYSV— SNRVGPW WNIK 59 

I I I I I I : I : : I II : : I I : : : I I I I : 

Db 21 SSL-QDSVFSSESDNSLYFTYSGQSNTLEVRDLTYQVDMASQV- PWFEQLAQFKLPWRSR 78 

Qy 60 SCQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGC 119 

I II I : : : I : I I I : : I : I I : I I : I I I I I : I I | : : : : | | 

Db 79 GSQDSWDLGI-RNLSFKVRSGQMLAIIGSAGCGRATLLDVITGRDHGGKMKSGQIWINGQ 137 

Qy 120 ELRRDQFQDCFSYVXQSDVFLSSLTWETLRYTAMLALCRS-SAJ)FYNKKVEAVMTELSL 178 

I I : : I I I I : I I I I I I I : I : I : : I : I : I I I : I I I 



Db 



138 PSTPQLIQKCVAHVRQQDQLLPNLTVRETLTFIAQMRLPKTFSQAQRDKRVEDVIAELRL 197 



Qy 179 SHVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVIjLLAE 238 

I : : I : I : I I I I I I I I I I I I : I : : : I I I I I : I I I I I : : I I : 

Db 198 RQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVRTLSR 257 

Qy 239 LARRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNP 298 

M : : I : I : : : : I I I I I : : I : I I : : : I I : : I : I : : I : I I I I I : I I I 
Db 258 LAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGVAQHMVQYFTSIGYPCPRYSNP 317 

Qy 299 FDFYMDLTSVDTQSREREIETYKRVQMLECAFKE SDIYHKI-LENIERARYLKTL 352 

I I I : I I I I : I : I : I : I : I : : : : I II I I :::: I : 

Db 318 ADFWDLTSIDRRSKEQEVATMEKARLLAALFLEKVQGFDDFLWKAEAKSLDTGTYAVSQ 377 

Qy 353 PMVPFKTKDP P GMFGKL GVL L RRVT RN LMRN KQ AVI MRLVQN L I MGL FL I FY 404 

: I : I III: I : I I I I : : : : : I I : I 

Db 378 TL TQDTNCGTAAELPGMIQQFTTLIRRQISNDFRDLPTLFIHGAEACLMSLIIGFL 433 

Qy 405 LLRVQNNT LKGAVQD RVGLL YQLVGAT P YT GMLN AVN L F PML RAVS DQ E S Q D GL YHKWQM 464 

: I : I ||:: I : : I : I : I : : I : I I I I 

Db 434 YYGHADKPL — S FMDMAALLFMI GALI P FNVI LDWS KCHS ERS LLYYELEDGL YTAGP Y 491 

Qy 465 LLAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFL T 518 

I I I I I : I : I I I I | | : : | I 

Db 492 FFAKVLGELPEHCAYVIIYGMPIYWLTNLRP GPELFLLHFMLLWLWFCCRT 543 

Qy 519 LVLLGIVQNPNI-VNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILV 577 

: I I ::| : : :| |: |: : I : :| ::| h 

Db 544 MALAASAMLPTFHMSSFCCNALYNSFYLTAGFMINLNNLWIVPAWISKMSFLRWCFSGLM 603 

Qy 578 VNEFYG LNFTCGGSN — TSM-LN-HPMCAI 603 

: I I II: I I : I I I I I : I I 

Db 604 QIQFNGHIYTTQIGNLTFSVPGDAMVTAMDLNSHPLYAI 642 



RESULT 3 
Q7TSR7 

ID Q7TSR7 PRELIMINARY; PRT; 672 AA. 

AC Q7TSR7; 

DT 01-OCT-2003 (TrEMBLrel. 25, Created) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ATP-binding cassette sub-family G member 8. 

GN ABCG8 . 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=I/LnJ; TISSUE=Liver ; 

RA Wittenburg H-, Lyons M.A. , Li R. , Churchill G.A. , Carey M.C., 

RA Paigen B. ; 

RT "Primary Roles of FXR and ABCG5/ABCG8 in Cholesterol Gallstone 

RT Susceptibility: Evidence from a Cross of PERA/Ei and I/Ln Inbred 

RT Mice."; 

RL Submitted (DEC-2002) to the EMBL/ GenBank/DDBJ databases. 



DR 
KW 
SQ 



EMBL; AY196215; AAO45095.1; 
ATP-binding. 

SEQUENCE 672 AA; 75805 MW; E5B30B5890200A41 CRC64; 



Query Match 20.9%; Score 705.5; DB 11; Length 672; 

Best Local Similarity 29.2%; Pred. No. 4.9e-44; 

Matches 195; Conservative 131; Mismatches 244; Indels 97; Gaps 19; 

Qy 27 QGSVTGTEARHSLGVLHVSYS VSNRVGPW WNIKS 60 

I | : : I : : I I : : I I : : : : I I I I I 

Db 24 QDSLFSSESDNS L YFT YS GQ SNT LEVRDLT YQVD I ASQV- PW FEQLAQ FKI PWRS H S 79 

Qy 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

I : | :::| : I I I : : I : I I I I I : : I I I I : I I 
Db 80 SQDSCELGI-RNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQP 138 

Qy 121 LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRS- 17 9 

: I :: I I I I : I I I I I I I : I : I I : I ': I : I I I : I I I 
Db 139 STPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLR 198 

Qy 180 HVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAEL 239 

I : : I : I : I I I I I I I I I I I I : I : : : I I I I I : I I I I I : : I I .: I 

Db 199 Q CANT RVGNT YVRGVS GGE RRRVS I GVQL LWN PGILILDEPTS GL D S FTAHN LVTT L S RL 258 

Qy 240 ARRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPF 299 

|: :|:|::::|||||::|: || : ::| I :: I ::|: :| : hill MM 
Db 259 AKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPA 318 

Qy 300 DFYMDLTSVDTQSREREIETYKRVQMLECAFKE SDIYHKI-LENIERARYLKTLP 353 

I I I : I I I I: I : h I I I: I : : I I II I I : : - ■ : I 

Db 319 DFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLT 37 8 

Qy 354 MVPFKTKDP P GMFGKLGVLLRRVT RNLMRN KQAVIMRLVQNL IMGLFL I F YL 405 

: hi III: 1 = 1! II: ::: : : I I : I 

Db 379 L TQDTDCGTAAELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGF — 432 

Qy 406 L RVQNNT L KGAVQDRVGL L YQ LVGAT P YT GMLNAVN L F PML RAVS DQ E S Q DGL YH KWQML 465 

I : : : I I I : : I : : I : I : h : I : I I I I 

Db 433 LYYGHGAKQLS FMDTAALLFMI GALI P FNVI LDWS KCHS ERSMLYYELEDGL YTAGP YF 492 

Qy 466 LAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFL TL 519 

I :| II :h: II II I || ||: :| |: 

Db 493 FAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELF LL — HLLLVWLWFCCRTM 544 

Qy 520 VLLGIVQNPNI-VNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILW 578 

I I : : I : : : I I : |: : I : : I : : I I : 

Db 545 ALAASAMLPT FHMS S FFCNAL YNS FYLTAGFMINLDNLWI VPAWI S KLS FLRWCFS GLMQ 604 

Qy 579 NEFYGL NFTCGGSNTSML NHPMCA ITQGVQFIEKTCPGATSRFT 622 

: I I I II : h : I h I I I : : 
Db 605 IQFNGHLYTTQI GNFTFS I LGDTMI SAMDLNSHPLYAI YLIVIGI S Y 651 

Qy 623 ANFLILY 629 

I I I I 

Db 652 -GFLFLY 657 



AC 
DT 
DT 
DT 
DE 
GN 
OS 
OC 
OC 
OX 
RN 
RP 
RC 
RA 
RT 
RT 
RL 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
SQ 



Created) 

Last sequence update) 
Last annotation update) 



RESULT 4 
Q8R543 

ID Q8R543 PRELIMINARY; PRT; 673 AA. 

Q8R543; 

01-JUN-2002 (TrEMBLrel . 21, 
01-JUN-2002 (TrEMBLrel. 21, 
01-OCT-2003 (TrEMBLrel. 25, 
Sterolin 2 . 
ABCG8. 

Mus mus cuius (Mouse) . 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. 
NCBI_TaxID=10090; 
[1] 

SEQUENCE FROM N.A. 
STRAIN=129/Sv; 

Lu K., Zhou Y. , Lee M.-H., Patel S.B.; 

"Molecular cloning, genomic structure and characterization of novel 
mouse head-to-head tandem ABC transporters."; 
Submitted (FEB-2001) to the EMBL/ GenBank/DDBJ databases. 



EMBL; AF351811; AAL82898.1; 
EMBL; AF351799; AAL82898.1; JOINED. 
EMBL; AF351800; AAL82898.1; JOINED. 
EMBL; AF351801; AAL82898.1; JOINED. 
EMBL; AF351802; AAL82898.1; JOINED. 
EMBL; AF351803; AAL82898.1; JOINED. 
EMBL; AF351804; AAL82 898.1; JOINED. 
EMBL; AF351805; AAL82898.1; JOINED. 
EMBL; AF351807; AAL82 898.1; JOINED. 
EMBL; AF351808; AAL82898.1; JOINED. 
EMBL; AF351809; AAL82898.1; JOINED. 
EMBL; AF351810; AAL82898.1; JOINED. 
GO; GO: 0016020; Crmembrane; IEA. 
GO; GO: 0005524; F: ATP binding; IEA. 
GO; GO: 0004009; F: ATP-binding cassette 
GO; GO: 0006810; P: transport; IEA. 
InterPro; IPR003439; ABC_transporter . 
Pfam; PF00005; ABC_tran; 1. 
ProDom; PD000006; ABC_transporter ; 1. 
PROSITE; PS00211; ABC_TRANSPORTER_l ; 1. 
PROSITE; PS50893; AB C_T RAN S PORT ER_2 ; 1. 

SEQUENCE 673 AA; 76008 MW; FA0834 0445DF259C CRC64; 



(ABC) transporter acti. 



IEA. 



Query Match 20.9%; Score 703.5; DB 11; 

Best Local Similarity 28.7%; Pred. No. 6.9e-44; 
Matches 193; Conservative 130; Mismatches 242; 



Length 673; 
Indels 107; 



Gaps 



17; 



Qy 

Db 

Qy 

Db 

Qy 



27 QGSVTGTEARHSLGVLHVSYS VSNRVGPW WNIKS 60 

I I : : I : : I I : : I I : : : : I M I I 

25 QDSLFSSESDNS LY FT YS GQ SNT LEVRDLT YQVD I AS QV- PWFEQLAQFKI PWRS H S 80 

61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

I : I : : : I : I I I : : I : I I I I I : : I I I I : I I | : : : : | | 

81 SQDSCELGI-RNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQP 139 

121 LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRS-SADFYNKKVEAVMTELSLS 179 
: I :: I I I I : I I I I I I I : I : I I : I : I : I I I : I I I 



Db 140 STPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLR 199 

Qy 180 HVADQMIGSYNFGGISSGERRRVSIAAQLLQD 239 

I : : I : I : I II I I I I I I I I I : I : : : I I I I I : I I I I I : : I I : I 
Db 200 QCANTRVGNTWRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRL 259 

Qy 240 ARRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPF 299 

I : : | : | : : : : | | I I I : : I : I I : : : I I : : I : : I : : I : hill : I I I 
Db 260 AKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPA 319 

Qy 300 DFYMDLTSVDTQSREREIETYKRVQMLECAFKE SDIYHKI-LENIERARYLKTLP 353 

I I h I I I h I : h I I I : I : : I I II I I : : : : : I ■ 

Db 320 DFYVDLTSIDRRSKEREVATVEKAQSLAALFLEWQGFDDFLWKAEAKELNTSTHTVSLT 379 

Qy 354 MVPFKTKDP PGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYL 405 

: |:| III : I : I I I h ::: : :| I : I 

Db 380 L- TQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGF — 433 

Qy 4 06 LRVQNNT L KGAVQD RVGL L YQ LVGAT P YT GMLNAVN L F PML RAVS DQE S Q D GL YH KWQML 465 

I : : : I I I : : h : h h h : I : I I I I 

Db 434 LYYGHGT^KQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYF 493 

Qy 466 IAWLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIV 525 

I : I I I : I : : II II II III: 

Db 4 94 FAKI LGELPEHCAYVI I YAMPI YWLTNLRP VPELFLLHFLLVWLWF 540 

Qy 526 QNPNIVNSIVALLS ISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCC 573 

I : : hi : : : I I : I : : I : : I : : I 

Db 541 CCRNMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCF 600 

Qy 574 EILWNEFYGL NFTCGGSNTSML NHPMCA ITQGVQFIEKTCPGA 617 

h : I I III : h : I h I I h : 
Db 601 SGLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISY 652 

Qy 618 TSRFTANFLILY 629 

I I I I 

Db 653 GFLFLY 658 



RESULT 5 
Q7TSR6 

ID Q7TSR6 PRELIMINARY; PRT; 672 AA. 

AC Q7TSR6; 

DT 01-OCT-2003 (TrEMBLrel. 25, Created) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ATP-binding cassette sub-family G member 8. 

GN ABCG8 . 

OS Mus mus cuius (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostortii; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC S T RAIN- P ERA/ Ei ; TISSUE=Liver ; 

RA Wittenburg H . , Lyons M.A., Li R. , Churchill G.A. , Carey M.C., 

RA Paigen B. ; 



RT "Primary Roles of FXR and ABCG5/ABCG8 in Cholesterol Gallstone 

RT Susceptibility: Evidence from a Cross of PERA/Ei and I/Ln Inbred 

RT Mice."; 

RL Submitted (DEC-2 002) to the EMBL/GenBank/DDBJ databases. 

DR EMBL; AY196216; AAO45096.1; 

KW ATP-binding. 

SQ SEQUENCE 672 AA; 75867 MW; CAB720502EA8FE21 CRC64; 

Query Match 20.8%; Score 701.5; DB 11; Length 672; 

Best Local Similarity 29.1%; Pred. No. 9.8e-44; 

Matches 194; Conservative 131; Mismatches 245; Indels 97; Gaps 19; 

Qy 27 QGSVTGTEARHSLGVLHVSYS VSNRVGPW WNIKS 60 

I I : : I : : I I : : I I : : : : | | I I I 

Db 24 QDSLFSSESDNS LYFTYSGQSNTLEVRDLTYQVDIASQV-PWFEQLAQFKIPWRSHS 79 

Qy 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

I : I : : : I : I I I : : I : I I I I I : : I I I I : II I : : : : I I 

Db 80 SQDSCELGI-RNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQP 138 

Qy 121 LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRS-SADFYNKKVEAVMTELSLS 179 

: I :: I I I I : I I I I I I I : I : I I : I : I : I I I : I I I 
Db 139 STPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVI7VELRLR 198 

Qy 180 HVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVW4LDEPTTGLDCMTANQIVLLLAEL 239 

I : : I : I : I I I II I I I I I I I : I : : : I I I I I : II I I I : : I hi 

Db 199 QCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRL 258 

Qy 240 ARRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPF 299 

| : : I : I : : : : I I I I I : : I : II : : : I I : : I : : I : : I : hill : I I I 
Db 259 AKGNRLVL I S LHQ P RS D I FRL FDLVLLMT S GT P I YLGAAQQMVQ YFT S I GH P C P RYSN PA 318 

Qy 300 DFYMDLT SVDTQS REREI ET YKRVQMLECAFKE SDIYHKI-LENIERARYLKTLP 353 

I I h I I I h I : h I I h I :: I I II I I : : : = : I 

Db 319 DFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLT 378 

Qy 354 MVPFKTKDP PGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYL 405 

: |:| III: h I I II: ::: : : I I : I 

Db 379 L TQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGF — 432 

Qy 406 LRVQNNT LKGAVQ D RVGL L YQ LVGAT P YT GMLNAVN L F PML RAVS DQ E S QDG L YH KWQML 4 65 

I : : : I I h : h : I : h I : : I : I I I I 

Db 433 LYYGHGAKQLSFMDTA7VLLFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYF 492 

Qy 466 LAYVLHVLPFSVIATVI FSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFL TL 519 

1:111 : I : : II II I I I I : : I h 

Db 493 FAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELF LL — HFLLVWLWFCCRTM 544 

Qy 520 VLLGIVQNPNI-VNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILW 578 

I I ::| : : :| |: |: : I : :| ::| h 

Db 545 ALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQ 604 

Qy 579 NEFYGL NFTCGGSNTSML NHPMCA ITQGVQFIEKTCPGATSRFT 622 

: I I III : h : I h I I h : 
Db 605 IQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISY 651 



Qy 



623 ANFLILY 629 



1 1 1 1 

Db 652 -GFLFLY 657 



RESULT 6 
Q8IX16 



ID Q8IX16 PRELIMINARY; PRT; 655 AA. 

AC Q8IX16; 

DT 01-MAR-2003 (TrEMBLrel. 23, Created) 

DT Ol-MAR-2003 (TrEMBLrel. 23, Last sequence update) 

DT Ol-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ATP-binding cassette protein ABCG2 . 

GN ABCG2 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID=9606; 

RN [1] 

RP SEQUENCE FROM N . A. 

RA Yoshikawa M. , Yabuuchi H., Ikegami Y. , Ishikawa T . ; 

RL Submitted (DEC-2001) to the EMBL/ GenBank/DDB J databases. 

DR EMBL; AF463519; AA014617.1; -. 

DR GO; GO: 0016020; Crmembrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; F:nucleotide binding; IEA. 

DR GO; GO:0006810; P : transport ; IEA. 

DR InterPro; IPR003593; AAA__ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR006162; Ppantne_S. 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC__transporter ; 1. 

DR SM7VRT; SM00382; AAA; 1. 

DR PROSITE; PS50893; AB C_T RAN S PORT ER_2 ; 1. 

DR PROSITE; PS00012; PHOSPHO PANTETHEINE; 1. 

KW ATP-binding. 

SQ SEQUENCE 655 AA; 72314 MW; A8AF60B591D4C5A8 CRC64; 



Query Match 20.6%; Score 692.5; DB 4; Length 655; 

Best Local Similarity 29.0%; Pred. No. 4.4e-43; 

Matches 181; Conservative 141; Mismatches 247; Indels 55; Gaps 16; 

Qy 25 LEQGSVTGTEARHS LGVLHVS YSVSNRVGPWWNI KS CQQKWDRQI LKDV 73 

: I |: I I I I :: | | : I |:: :::| I :: 

Db 12 VSQGNTNGFPATASNDLKAFTEGAVLSFHNICYRVKLKSG FLPCRKPVEKEILSNI 67 

Qy 74 SLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYV 133 

: :: I : I I I : I I I : : I I I : : I :| I |:| :|| I |: II 
Db 68 NGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDVLINGAP-RPANFKCNSGYV 124 

Qy ' 134 LQSDVFLSSLTVT^ETLRYTAMLALCRSSADF-YNKKVEAVMTELSLSHVADQMIGSYNFG 192 

: I I I : : I I I I I I : : : I I I : : I : : : I : I I I III : I : 
Db 125 VQDDWMGTLTVRENLKFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKVGTQFIR 184 

Qy 193 GISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELARRDRIVIVTIHQ 252 

I : I I I I : I I I : I : I I :: I I I I I I I I I I I I : : I I I : : : : I : I : M I 
Db 185 GVS GGERKRT S I GMELITDP S I LFLDEPTTGLDSSTANAVLLLLKRMS KQGRT 1 1 FS IHQ 244 



Qy 253 PRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQ- 311 

|| :|: || : :| I |:| I :| ||:| : II I : : I I I I : : I : : I : 
Db 245 PRYSIFKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTA 304 

Qy 312 -SREREIETYKRVQMLECAFKESDIYHKI LENI ERARYLKT 351 

: || I :| :::| : :: : I: I I: : : 

Db 305 VALNRE-EDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITV 363 

Qy 352 LPMVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNN 411 

: : | :| : :| :||: I II I ::: :::|| : ::|: 

Db 364 FKEISYTT S FCHQLRWVS KRS FKNLLGNPQAS I AQI I VTWLGLVI GAI YFGLKND 419 

Qy 412 T L KGAVQ D RVGLL YQLVGAT P YT GMLNAVN L F PML RAVS DQE S QDGL YH KWQMLLAYVL - 470 

: : | : | | : | : I : : :: I I I I : : : I II I : I 

Db 420 ST — GIQNRAGVLFFLTTNQCFSS-VSAVELFWEKKLFIHEYISGYYRVSSYFLGKLLS 476 

Qy 471 HVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNI 530 

: I I : : : : I I : : I : I I I I : I I : ' : : : I I : : 

Db 477 DLLPMRMLPSI I FTCIVYFMLGLKPKADAFFVMMFTLM MVAYSAS SMALAIAAGQSV 533 

Qy 531 VNSIVALLSIS — GLLIGSGFIRNIQEMPIPLKILGYFT FQKYCCEILWNEFYGLNFTC 588 

I : | : : | : : I I I : I : : I I I I : : I I I I I I I I I 

Db 534 VSVATLLMTICFVFMMIFSGLLVNLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNF-C 592 

Qy 589 GGSNTSMLNHPMCAITQGVQFIEK 612 

I I : I I I ::: I 

Db 593 PGLNATGNNPCNYATCTGEEYLVK 616 



RESULT 7 
Q96TA8 

ID Q96TA8 PRELIMINARY; PRT; 655 AA. 

AC Q96TA8; 

DT 01-DEC-2001 (TrEMBLrel. 19, Created) 

DT 01-DEC-2001 (TrEMBLrel. 19, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ATP-binding cassette superfamily G (White) member 2 (Hypothetical 

DE protein) . 

GN ABCG2 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID=9606; 

RN [1] 

RP SEQUENCE FROM N . A. 

RX MEDLINE=21201983; PubMed-11306452 ; 

RA Komatani H-, Kotani H., Hara Y., Nakagawa R. , Matsumoto M. , 

RA Arakawa H . , Nishimura S.; 

RT "Identification of breast cancer resistant protein/mitoxantrone 

RT resistance/placenta-specific, ATP-binding cassette transporter as a 

RT transporter of NB-506 and J-107088, topoisomerase I inhibitors with an 

RT indolocarbazole structure."; 

RL Cancer Res. 61:2827-2832(2001). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Pancreatic carcinoma; 



RA Strausberg R. ; 

RL Submitted (JAN-2002) to the EMBL/ GenBank/DDBJ databases. 

DR EMBL; AB051855; BAB46933.1; -. 

DR EMBL; BC021281; AAH21281.1; 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO:0006810; P:transport; IEA. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR006162; Ppantne_S . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_trarisporter; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

DR PROSITE; PS00012; PHOSPHOPANTETHEINE; 1. 

KW Hypothetical protein; ATP-binding. 

SQ SEQUENCE 655 AA; 72314 MW; A8AF66B96034C5A8 CRC64; 

Query Match 20.5%; Score 691.5; DB 4; Length 655; 

Best Local Similarity 29.0%; Pred. No. 5.3e-43; 

Matches 181; Conservative 141; Mismatches 247; Indels 55; Gaps 16; 



Qy 


25 


LEQGSVTGTEARHS LGVLHVS YSVSNRVGPWWNI KSCQQKWDRQI LKDV 

: | | : | I I | :: | | : 1 |:: :::| | :: 
VSQGNTNGFPATASNDLKAFTEGAVLSFHNICYRVKLKSG FLPCRKPVEKEILSNI 


73 


Db 


12 


67 


Qy 


74 


SLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYV 

: :: | : I I I : I I I : : 1 1 1 : : 1 :| I |:| :|| 1 |: II 
NGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDVLINGAP-RPANFKCNSGYV 


133 


Db 


68 


124 


Qy 


134 


LQbD V r JjbbJ-tl V KCj 1 LKi lAIYLlxfUjL.Koo/UJr UN l\j\vriM.vi v ix iLJ-ioijon V-ttJJ^ri-L 00 nuo 

: I 1 1 : : 1 1 1 1 1 1 : : : 1 1 1 : : 1 : : : hill III : 1 : 
VQDDVVMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKVGTQFIR 


192 


Db 


125 


184 


Qy 


193 


GISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELARRDRIVIVTIHQ 

I : | | | I : I I I : 1 : 1 1 :: 1 1 1 1 1 1 1 1 1 1 1 1 : : 1 1 1 : : : : 1 : 1 : 1 1 1 
GVSGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQ 


252 


Db 


185 


244 


Qy 


253 


PRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQ- 

1 1 :|: || : :| 1 |:| 1 :| ||:| : II 1 ll::|: : h 
PRYSIFKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTA 


311 


Db 


245 


304 


Qy 


312 


- S REREI ET YKRVQMLECAFKES D IYHKI LEN I ERARYLKT 

: | | | : I : : : 1 : : : : 1 : 1 1 : : : 


351 


Db 


305 


VALNRE-EDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITV 


363 


Qy 


352 


LPMVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNN 


411 


Db 


364 


: : | :| : :| :||: I || I ::: :::|| : ::|: 
FKEISYTT SFCHQLRWVSKRSFKNLLGNPQASIAQIIVTWLGLVIGAIYFGLKND 


419 


Qy 


412 


TLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVL- 


470 


Db 


420 


: :|:| |:|: | :: ::|| | | : : : 1 1 1 1 :| 
ST— GIQNRAGVLFFLTTNQCFSS-VSAVELFWEKKLFIHEYISGYYRVSSYFLGKLLS 


476 


Qy 


471 


HVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNI 
: | | : : : : | | : : | : 1 1 1 1 : 1 1 : : : : : 1 1 : : 
DLLPMRMLPSIIFTCIVYFMLGLKPKADAFFVMMFTLM MVAYSAS SMALAI AAGQSV 


530 


Db 


477 


533 



Qy 531 VNSIVALLSIS — GLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTC 588 

I : | : : I : : I I I : I : : I III: : I I I I I I I I I 

Db 534 VSVATLLMTICFVFMMI FSGLLVNLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNF-C 592 

Qy 589 GGSNTSMLNHPMCAITQGVQFIEK 612 

I I : I I I ::: I 

Db 593 PGLNATGNNPCNYATCTGEEYLVK 616 



RESULT 8 
Q8MIB3 

ID Q8MIB3 PRELIMINARY; PRT; 656 AA. 

AC Q8MIB3; 

DT Ol-OCT-2002 (TrEMBLrel. 22, Created) 

DT Ol-OCT-2002 (TrEMBLrel. 22, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Brain multidrug resistance protein. 

GN BMDP . 

OS Sus scrofa (Pig) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Cetartiodactyla; Suina; Suidae; Sus. 

OX NCBI_TaxID=9823; 

RN [1] 

RP SEQUENCE FROM N . A. 

RX MEDLINE=22050127; PubMed=12054514 ; 

RA Eisenblaetter T. , Galla H.J.; 

RT "A new multidrug resistance protein at the blood-brain barrier."; 

RL Biochem. Biophys . Res. Commun. 293:1273-1278(2002). 

DR EMBL; AJ420927; CAD12785.1; -. 

DR PIR; JC7860; JC7860. 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; F:nucleotide binding; IEA. 

DR GO; GO:0006810; P:transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR006162; Ppantne_S. 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

DR PROSITE; PS00012; PHOSPHOPANTETHEINE; 1. 

KW ATP-binding. 

SQ SEQUENCE 656 AA; 72392 MW; 118ADD5B53D9D67F CRC64; 

Query Match 20.3%; Score 685; DB 6; Length 656; 

Best Local Similarity 29.7%; Pred. No. 1.6e-42; 

Matches 187; Conservative 130; Mismatches 228; Indels 84; Gaps 19; 

Qy 31 TGTEARHSLGVLHVSY-SVSNRVGPWWNIKS CQQKWDRQILKDVSLYIESGQIMCI 85 

: I : I I : I : : II Ml I : : : : : I I : : : : : I : I 

Db 24 SSNELKTSAGGAVLSFHDICYRV KVKSGFLFCRKTVEKEILTNINGIMKPG-LNAI 78 

Qy 86 LGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYVLQSDVFLSSLTV 145 

I I : I I I : : I I I : : I I I I : I : I I I I : I I : I I I : : I I I 

Db 79 LGPTGGGKSSLLDVLAARKDPHG-LSGDVLINGAP-RPANFKCNSGYWQDDVVMGTLTV 136 



Qy 146 RETLRYTAMLALCRS SADF- YNKKVEAVMT ELS LSHVADQMI GS YNFGGI S S GERRRVS I 204 

M I:::| II : : I::: hill III M: f = I 111 = 1 II 

Db 137 RENLQFSAALRLPTTMTNHEKNERINMVIQELGLDKVADSKVGTQFIRGVSGGERKRTSI 196 

Qy 205 AAQLLQDPKVMMLDEPTTGLDCMT^ 2 64 

I : I : I I :: I I I I I I I I I I I I :: I I I : : : : I : I : I I I I I : I : I I : 
Db 197 AMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQPRYSIFKLFDSL 256 

Qy 265 AILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQ S 312 

:| 11:11 I 11:1 : II I : : I I I I : : I : : I : : 
Db 257 TLLASGRLMFHGPAREALGYFASIGYNCEPYNNPADFFLDVINGDSSAWLSRADRDEGA 316 

Qy 313 REREIETYKRVQMLE— CAF KESDI YHKILENIERAR 347 

: I I | ::: || |:| :| :: 

Db 317 QEPEEPPEKDTPLIDKLAAFYTNSSFFKDTKVELDQFSGGRKKKKSSVYKEVTYTTSFCH 376 

Qy 348 YLKTLPMVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFL— IFYL 4 05 

I : : II : | | : I I I : : : : : I : I I : III 

Db 377 QLRWIS RRS FKNLLGNPQASVAQI I VT I ILGLVI GAI FYD 416 

Qy 406 LRVQNNTLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQML 465 

I : I I : I : I I : I : I : : : : I I I : : : I II 

Db 417 LK NDP S G- 1 QNRAGVLFFLTTNQCFS S -VSAVELLWEKKLFI HEYI S GYYRVS S YF 471 

Qy 466 LAYVL-HVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGI 524 

: I : I I : : : : II : : I : I I I I I I I : : : : : I I 

Db 472 FGKLLSDLLPMRMLPSI I FTCITYFLLGLKPAVGSFFIMMFTLM MVAYS AS SMALAI 528 

Qy 525 VQNPNIVNSIVALLSIS — GLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFY 582 

: : I : I : : I I : : I I I : I : : : I III: : I I I I I 

Db 529 AAGQSWSVATLLMTI S FVFMMI FSGLLVNLKTWPWLSWLQYFS I PRYGFSALQYNEFL 58 8 

Qy 583 GLNFTCGGSNTSMLNHPMCAITQGVQFIE 611 

I II I I I : I II I :::| 

Db 589 GQNF-CPGLNVTTNNTCSFAICTGAEYLE 616 



RESULT 9 
Q96LD6 

ID Q96LD6 PRELIMINARY; PRT; 655 AA. 

AC Q96LD6; 

DT 01-DEC-2001 (TrEMBLrel. 19, Created) 

DT 01-DEC-2001 (TrEMBLrel. 19, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ABC transporter ABCG2 . 

GN ABCG2 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxI D=9 6 0 6 ; 

RN [1] 

RP SEQUENCE FROM N.A. 

RA Schuetz J.D., Wall A.M., Sampath J., Sorrentino B. , Du G. ; 

RT "The Human ABC Transporter, ABCG2, Transports Hoechst 33342 and 

RT Requires an Intact Walker A Motif."; 

RL Submitted (JAN-2001) to the EMBL/ GenBank/ DDB J databases. 



DR EMBL; AY017168; AAG52 982.1; 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; F: nucleotide binding; IEA. 

DR GO; GO:0006810; P:transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR006162; Ppantne_S . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

DR PROSITE; PS00012; PHOSPHOPANTETHEINE; 1. 

KW ATP-binding. 

SQ SEQUENCE 655 AA; 72288 MW; B3B5DC02C095C4A8 CRC64; 

Query Match 20.3%; Score 683.5; DB 4; Length 655; 

Best Local Similarity 28.8%; Pred. No. 2.1e-42; 

Matches 180; Conservative 141; Mismatches 248; Indels 55; Gaps 16; 

Qy 25 LEQGSVTGTEARHS LGVLHVSYSVSNRVGPWWNIKSCQQKWDRQILKDV 73 

: II: I I I I :: | | : | |:: :::|| :: 

Db 12 VSQGNTNGFPATASNDLKAFTEGAVLSFHNICYRVKLKSG FLPCRKPVEKEILSNI 67 

Qy 74 SLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYV 133 

: : : I : I I I : I I I : : I I I : : I : I I I : I : I I I I : I I 

Db 68 NGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDVLINGAP-RPANFKCNSGYV 124 

Qy 134 LQ S DVFLS S LT VRET LRYTAMLALC RS S AD F- YN KKVEAVMT ELS LS HVADQMI GS YN FG 192 

: I I I : : I I II I I : : : I I I : : I : : : I : I I I III : I : 
Db 125 VQDDWMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKVGTQFIR 184 

Qy 193 GISSGERRRVSIAAQLLQDPKA/MMLDEPTTGLDCMTANQIVLLLAEL 252 

I : I I I I : I I I : I : I I : : I I I I I I I I I I I I :: I I I : : : : I : I : I I I 

Db 185 GVSGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQ 244 

Qy 253 PRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQ- 311 

II : I : II : : I I I : I I : I I I : I : I I I : : I I I I : : I : : I : 

Db 245 PRYSIFKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTA 304 

Qy 312 - S RERE I ET YKRVQMLECAFKES D I YH KI LENI ERARYLKT 351 

: | I I :| :::| : :: : I: I I: : : 

Db 305 VALNRE-EDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITV 363 

Qy 352 LPMVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNN 411 

: : I : I : : I : I I : I I I I : : : : : : I I : : : I : 

Db 364 FKEISYTT SFCHQLRWVSKRSFKNLLGNPQASIAQIIVTWXGLVIGAIYFGLKND 419 

Qy 412 TLKGAVQ DRVGLL YQLVGAT P YT GMLN AVN L FPML RAVS DQ E S QD GL YH KWQML LAYVL - 470 

: : I : I I : I : I :: ::|| || : : : | I I I :| 

Db 420 ST — GIQNRAGVLFFLTTNQCFSS-VSAVELFWEKKLFIHEYISGYYRVSSYFLGKLLS 476 

Qy 471 HVLPFSVIATVIFSSVCYWTLGLYPEVAI^FGYFSAALLAPHLIGEFLTLVLLGIVQNPNI 530 

: | | : : : : I I : : I : I I I : I I : : : : : I I 
Db 477 DLLPMRMLPS 1 1 FTCI VYFMLGLKAKADAFFVMMFTLM MVAYSAS SMALAIAAGQSV 533 



Qy 531 VNSIVALLSIS — GLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTC 588 

| : | :: | :: I I I : I : : I III: : I I I II I I I I 

Db 534 VSVATLLMTICFVFMMI FSGLLVNLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNF-C 592 

Qy 589 GGSNTSMLNHPMCAITQGVQFIEK 612 

I I : I I I ::: I 

Db 593 PGLNATGNNPCNYATCTGEEYLVK 616 



RESULT 10 
Q9R004 

ID Q9R004 PRELIMINARY; PRT; 657 AA. 

AC Q9R004; 

DT 01-MAY-2000 (TrEMBLrel. 13, Created) 

DT 01-MAY-2000 (TrEMBLrel. 13, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Breast cancer resistance protein 1, 

GN ABCG2 OR BCRPl. 

OS Mus mus cuius (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N . A. 

RC STRAIN=FVB; TISSUE=Liver ; 

RX MEDLINE=99413474; PubMed=104 85464 ; 

RA Allen J.D., Brinkhuis R.F., Wijnholds J., Schinkel A.H.; 

RT "The mouse Bcrpl/Mxr/Abcp gene: amplification and overexpression in 

RT cell lines selected for resistance to topotecan, mitoxantrone, or 

RT doxorubicin."; 

RL Cancer Res. 59:4237-4241(1999). 

DR EMBL; AF140218; AAD54216.1; 

DR MGD; MGI: 1347061; Abcg2 . 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; F: nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR006162; Ppantne_S. 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS50893; ABC_TRANSPORTER_2 ; 1. 

DR PROSITE; PS00012; PHOSPHOPANTETHEINE; 1. 

KW ATP-binding. 

SQ SEQUENCE 657 AA; 73021 MW; 2 07B70BC272CC0D5 CRC64; 

Query Match 20.1%; Score 677; DB 11; Length 657; 

Best Local Similarity 29.7%; Pred. No. 6.4e-42; 

Matches 189; Conservative 135; Mismatches 236; Indels 76; Gaps 21; 

Qy 15 PHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS CQQKWDRQILK 71 

| : I : : : I : I I I I : : I I : I I : : : : : I I 

Db 21 P RMN S RAVRT LAEGDV LSFHHITYRV KVKSGFLVRKTVEKEILS 64 



Qy 72 DVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFS 131 

I : : : : I : I I I : I I I : : I I I : : I I I I : I : I I : I : I 

Db 65 DINGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPKG-LSGDVLINGAP-QPAHFKCCSG 121 

Qy 132 YVLQSDVFLSSLTVRETLRYTAMIJVLCRSSADF-YNKKVEAVMTELSLSHVADQMIGSYN 190 

Ihl II : :|IMI |:::| I I : : |::: :: II I III :|: 
Db 122 YWQDDVVMGTLTVRENLQFSAALRLPTTMKNHEKNERINTIIKELGLEKVADSKVGTQF 181 

Qy 191 FGGISSGERRRV5IAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELARRDRIVIW 250 

111111:111 : I : I I : : I I I I I I I I I I I I : : I I I : : : : I : I : I 
Db 182 IRGISGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSI 241 

Qy 251 HQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDT 310 

I I I I : I : I I : : I I : I I I I , : : I : I : I I I : : I I I I : : I : : I : 
Db 242 HQPRYSIFKLFDSLTLLASGKLVFHGPAQKALEYFASAGYHCEPYNNPADFFLDVINGDS 301 

Qy 311 QS REREIETYKR VQMLECAFKESDIYHKILENIERARYLKTLPMV 355 

: I : : I I : : I : I I I I III 

Db 302 S AVMLN REEQDNEAN KT E E P S KGE K P VI EN L S E FY I N S AI YG ETKAELDQLPGA 355 

Qy 356 PFKT KDP — -PGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFL — IFYL 405 

I I : I : I : I I : I I : I I I : : I : : I : I I : I : : 

Db 356 QEKKGTSAFKEPVYVTSFCHQLRWIARRSFKNLLGNPQASVAQLIVTVILGLIIGAIYFD 415 

Qy 406 LRVQNNTLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQML 4 65 

I: :|:| |:|: | :: ::|| || : : : | | | 

Db 416 LKYD AAGMQNRAGVLFFLTTNQC FS S - VS AVELFWEKKLFI HE YI S GYYRVS S YF 470 

Qy 4 66 LAYVL-HVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGI 524 

I : : I I : : I I I : : I : I I I I I I : : : : : I I 
Db 471 FGKVMSDLLPMRFLPSVI FTCI LYFMLGLKKTVDAFFIMMFTLI MVAYTAS SMALAI 527 

Qy 525 VQNPNIVNSIVALLSIS — GLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFY 582 

: : I : I : : I : : : : I I : I : : : I III: : I I III 

Db 528 ATGQSWSVATLLMTIAFVFMMLFSGLLWLRTIGPWLSWLQYFSIPRYGFTALQYNEFL 587 

Qy 583 GLNFTCGGSN TSMLNHPMCA ITQGVQ 608 

I I I I I | : : : : | III:: 

Db 588 GQEF-CPGFNVTDNSTCVNSYAICTGNEYLINQGIE 622 



RESULT 11 
Q7TMS5 

ID Q7TMS5 PRELIMINARY; PRT; 657 AA. 

AC Q7TMS5; 

DT 01-OCT-2003 (TrEMBLrel . 25, Created) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 2. 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=C57BL/6NCr; TISSUE=Hematopoietic Stem Cell; 

RX MEDLINE=22388257; PubMed-12477932 ; 



RA Strausberg R.L., Feingold E.A. , Grouse L.H., Derge J.G., 

RA Klausner R.D., Collins F.S., Wagner L. f Shenmen CM. , Schuler G.D. f 

RA Altschul S.F., Zeeberg B., Buetow K.H., Schaefer C.F., Bhat N.K., 

RA Hopkins R.F., Jordan H., Moore T., Max S.I., Wang J., Hsieh F., 

RA Diatchenko L., Marusina K., Farmer A. A., Rubin G.M., Hong L., 

RA Stapleton M. , Soares M.B., Bonaldo M.F., Casavant T.L., Scheetz T.E., 

RA Brownstein M. J., Usdin T.B., Toshiyuki S., Carninci P., Prange C, 

RA Raha S.S., Loquellano N.A., Peters G.J., Ab rams on R.D., Mullahy S.J-, 

RA Bosak S.A. , McEwan P. J., McKernan K.J., Malek J. A., Gunaratne P.H., 

RA Richards S., Worley K.C., Hale S., Garcia A.M., Gay L.J., Hulyk S.W., 

RA Villalon D.K., Muzny D.M., Sodergren E.J., Lu X., Gibbs R.A. , 

RA Fahey J., Helton E., Ketteman M., Madan A., Rodrigues S., Sanchez A., 

RA Whiting M. , Madan A., Young A.C., Shevchenko Y., Bouffard G.G., 

RA Blakesley R.W., Touchman J.W., Green E.D., Dickson M.C., 

RA Rodriguez A.C., Grimwood J. , Schmutz J., Myers R.M., Butterfield Y.S., 

RA Krzywinski M.I., Skalska U., Smailus D.E., Schnerch A. , Schein J.E., 

RA Jones S.J., Marra M.A.; 

RT "Generation and initial analysis of more than 15,000 full-length human 

RT and mouse cDNA sequences."; 

RL Proc. Natl. Acad. Sci. U.S.A. 99:16899-16903(2002). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN=C57BL/6NCr; TISSUE=Hematopoietic Stem Cell; 

RA Strausberg R. ; 

RL Submitted (JUN-2003) to the EMBL/ GenBank/ DDB J databases. 

DR EMBL; BC053730; AAH53730.1; 

KW ATP-binding. 

SQ SEQUENCE 657 AA; 72977 MW; DCD7 0C5D9FA2BA5F CRC64; 



Query Match 20.1%; Score 676; DB 11; Length 657; 

Best Local Similarity 29.9%; Pred. No. 7.6e-42; 

Matches 190; Conservative 133; Mismatches 237; Indels 76; Gaps 21; 



Qy 


15 


PHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS CQQKWDRQILK 

1 1 : : : 1 : 1 1 1 1 : : 1 1 : i 1 : : : : : I 1 


71 


Db 


21 


PRTNS RAVRT LAEGDV LS FHH I TYRV KVK S GFLVRKTVEKEI L S 


64 


Qy 


72 


DVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFS 

1 : : : : 1 : 1 1 1 : 1 1 1 : : 1 1 1 : : 1 1 1 1 : 1 : 1 I : hi 
DINGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPKG-LSGDVLINGAP-QPAHFKCCSG 


131 


Db 


65 


121 


Qy 


132 


YVLQSDVFLSSLTVRETLRYTAMLALCRSSADF-YNKKVEAVMTELSLSHVADQMIGSYN 

1 1 : 1 1 1 : : 1 1 1 1 1 1 : : : 1 1 1 : : h : : : : 1 1 1 II 1 : h 
YWQDDVVMGTLTVRENLQFSAALRLPTTMKNHEKNERINTIIKELGLEKVADSKVGTQF 


190 


Db 


122 


181 


Qy 


191 


FGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELARRDRIVIVTI 

III 111 = 1 II :h II :: lllllllll III ::IM :::: 1 :| =1 
I RGI SGGERKRTSI GMELITDPS I LFLDEPTTGLDS STANAVLLLLKRMSKQGRTI I FS I 


250 


Db 


182 


241 


Qy 


251 


HQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDT 

1 1 1 1 : 1 : 1 1 : : 1 h II 1 1 : : I : I : I 1 1 : : 1 1 1 h : h : h 
HQPRYSIFKLFDSLTLLASGKLVFHGPAQKALEYFASAGYHCEPYNNPADFFLDVINGDS 


310 


Db 


242 


301 


Qy 


311 


QS REREIETYKR VQMLECAFKESDIYHKILENIERARYLKTLPMV 

: h : 1 1 : : | : | | | 1 III 
SAVMLNREEQDNEANKTEEPSKGEKPVI ENLSEFYINSAIYG ETKAELDQLPGA 


355 


Db 


302 


355 



Qy 356 PFKT KDP PGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFL — IFYL 405 

| | : | : I : I I : I I : I I I : : I : : I : I I : I : : 

Db 356 QEKKGTSAFKEPVYVTSFCHQLRWIARRSFKNLLGNPQASVAQLIVTVILGLIIGAIYFD 415 

Qy 406 L RVQNNT L KGAVQ D RVGLL YQ L VGAT P YT GMLNAVN L F PML RAVS DQESQDGLYH KWQML 4 65 

I : : | : | | : | : | : : :: I I | | : : : I II 

Db 416 LKYD AAGMQNRAGVLFFLTTNQCFSS-VSAVELFWEKKLFIHEYISGYYRVSSYF 470 

Qy 466 LAYVL-HVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGI 524 

I : : I I : : | | | : | | : | | | I I I : :: : : I I 

Db 471 FGKVMSDLLPMRFLPSVI FTCVLYFMLGLKKTVDAFFIMMFTLI MVAYTAS SMALAI 527 

Qy 525 VQNPNIVNSIVALLSIS — GLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFY 582 

: : I : I : : I : : : : I I : I : : : I III: : I I II I 

Db 528 ATGQSWSVATLLMTIAFVFMMLFSGLLVNLRTIGPWLSWLQYFSIPRYGFTALQYNEFL 587 

Qy 583 GLNFTCGGSN TSMLNHPMCA ITQGVQ 608 

I I I I I | : :: :| III:: 

Db 588 GQEF-CPGFNVTDNSTCVNSYAICTGNEYLINQGIE 622 

RESULT 12 
Q80W57 



ID Q80W57 PRELIMINARY; PRT; 657 AA. 

AC Q80W57; 

DT 01-JUN-2003 (TrEMBLrel. 24, Created) 

DT 01-JUN-2003 (TrEMBLrel. 24, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ABC transporter ABCG2 . 

GN ABCG2 . 

OS Rattus norvegicus (Rat) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Rattus. 

OX NCBI_TaxID=10116; 

RN [1] 

RP SEQUENCE FROM N . A. 

RC STRAIN=wistar; 

RA Hori S., Ohtsuki S. f Terasaki T.; 

RT "Expression and regulation of ABCG2 at the rat blood-brain barrier."; 

RL Submitted (MAR-2003) to the EMBL/GenBank/DDBJ databases. 

DR EMBL; AB105817; BAC76396.1; 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti . . .; IEA. 

DR GO; GO: 0000166; F:nucleotide binding; IEA. 

DR GO; GO:0006810; P:transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR006162; Ppantne_S. 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

DR PROSITE; PS00012; PHOSPHOPANTETHEINE; 1. 

SQ SEQUENCE 657 AA; 72960 MW; C975C61A084 8 9027 CRC64; 



Query Match 



19.6%; Score 659; DB 11; Length 657; 



Best Local Similarity 29.3%; Pred. No. 1.4e-40; 

Matches 192; Conservative 132; Mismatches 248; Indels 84; Gaps 21; 



Qy 4 LPFLSPEGARGPHINRGSLSSLEQGSWGTEARHSLGVLHVSYSVSNRVGPWWNIKSCQQ 63 

II :| III : | : | | I I : I : :: : 

Db 20 LPGMSSRGAR TLAEGDV LSFHHITYRVKVKSG — FLVRKTAE 59 

Qy 64 KWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFWGCELRR 123 

I : I I I : : : : I : i I I : I I i : : I I I : : I I I I : I : I I : 

Db 60 K EILSDINGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPRG-LSGDVLINGAP-QP 113 



Qy 124 DQFQDCFSYVXQSDVFLSSLTVT^ETLRYTAMLAL^ 182 

I : I I : I I I : : I I I I I I : : : I I I : : I : : : : : M I II 

Db 114 ANFKCSSGYWQDDVVTyiGTLTVRENLQFSAALRLPK7\MKTHEKNERINTIIKELGLDKVA 173 

Qy 183 DQMIGSYNFGGISSGERRRVSIAAQLLQDPKWIMLDEPTTGLDCMTANQIv^ 242 

I : I : 11111:111 : I : I I : : I I I I I I I I I I I I : : I I I : : : : 

Db 174 DSKVGTQFTRGISGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQ 233 



Qy 243 DRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFY 302 

I : I : I I I I I :|: || : :| I : I : I I :: I :l : II I ::|l II: 
Db 234 GRTIIFSIHQPRYSIFKLFDSLTLLASGKLMFHGPAQKALEYFASAGYHCEPYNNPADFF 293 

Qy 303 MDLTSVDTQS REREI ETYKRVQMLECAFKESDI YHKI LENI ERA 346 

: I : : I : : I : : I I : I I : I I : I 

Db 294 LDVINGDSSAVMLNRGEQDHEANKTEE PSKREKPIIENLAEFYINSTIYGETK 346 



Qy 347 RYLKT L PMVP FKT KDP P GMFGKL GVLL RR VT RN LMRN KQ AVI MRL VQN L I MG 398 

I I I : I :: I : I : I I : I I : I I I : : I : : I : I 

Db 347 AELDQLPVAQKKKGSSAFREPVYVTSFCHQLRWIARRSFKNLLGNPQASVAQLIVTVILG 406 

Qy 399 L FL I F YLLRVQNNT LKGAVQ D RVGL L YQ LVGAT P YT GMLNAVNL F PML RAVS DQESQDGL 458 

I : : : I : : I : I I : : I : I : : I I I I : : : I I 

Db 4 07 LIIGALYFGLKNDPT— GMQNRAGVFFFLTTNQCFTS-VSAVELFWEKKLFIHEYISGY 463 

Qy 459 YHKWQMLLA-YVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFL 517 

I 1:11 : : | | : : : I : I I I I I I : : : 

Db 464 YRVSSYFFGKLVSDLLPMRFLPSVIYTCILYFMLGLKRTVEAFFIMMFTLI MVAYTA 520 



Qy 518 TLVLLGIVQNPNIVNSIVALLSIS — GLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEI 575 

: : I I ::|: |::| I ::: I I : |:: : I III: :| 

Db 521 SSMALAIAAGQSWSVATLLMTISFVFMMLFSGLLWLRTIGPWLSWLQ 580 

Qy 576 LWNEFYGLNFTCGGSNTSMLNHPMCAITQGVQFIEKTCPGATSRFTAN-FLILYG 630 

I I I I I I I I I : I I I : : I I : I I I 

Db 581 LQHNEFLGQEF-CPGLNVTM NSTCVNSYTICTGNDYLINQG 620 



RESULT 13 
Q80XF3 

ID Q80XF3 PRELIMINARY; PRT; 657 AA. 

AC Q80XF3; 

DT 01-JUN-2003 (TrEMBLrel. 24, Created) 

DT 01-JUN-2003 (TrEMBLrel. 24, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ATP-binding cassette transporter ABCG2 . 

GN ABCG2 . 



OS Rattus norvegicus (Rat) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Rattus. 

OX NCBI_TaxID=10116; 

RN [1] 

RP SEQUENCE FROM N.A. 

RA Shimano K. r Satake M. , Okaya A., Kitanaka J. , Kitanaka N., 

RA Takemura M., Sakagami M. , Terada N . r Tsujimura T.; 

RT "Hepatic Oval Cells Have the Side Population Phenotype Defined by 

RT Expression of ATP-binding Cassette Transporter ABCG2/BCRP1 . " ; 

RL Am. J. Pathol. 0:0-0(2003). 

DR EMBL; AB094089; BAC75666.1; 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; Frnucleotide binding; IEA. 

DR GO; GO:0006810; P:transport; IEA. 

DR InterPro; IPR003593; AAA_AT Pa s e . 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR006162; Ppantne_S . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

DR PROSITE; PS00012; PHOSPHOPANTETHEINE; 1. 

KW ATP-binding. 

SQ SEQUENCE 657 AA; 72961 MW; 458 98 0CC3903D5CE CRC64; 

Query Match 19.5%; Score 657; DB 11; Length 657; 

Best Local Similarity 29.5%; Pred. No. 2e-40; 

Matches 194; Conservative 130; Mismatches 246; Indels 88; Gaps 22; 



Qy 


4 


LPFLSPEGARGPHINRGSLSSLEQGSWGTEARHSLGVLHVSYSVSNRVGPWWNIKSCQQ 
II :| III :| :| 1 1 |::| I : 1 : :: : 
L P GMS S RGAR T LAEGDV LSFHHI T YRVKVKS G-- FLVRKTAE 


63 


Db 


20 


59 


Qy 


64 


KWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRR 

1 : 1 1 1 : : :: 1 : 1 II : 1 1 1 : : 1 1 1 : : 1 1 1 hi Ml : 
K EILSDINGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPRG-LSGDVLINGAP-QP 


123 


Db 


60 


113 


Qy 


124 


DQFQDCFSWLQSDVFLSSLTVRETLRYTAMIJ^CRS-STVDFYNKKVT^VMTELSLSHVA 
I: 1 1 : 1 1 1 : : 1 1 1 1 1 |:::| II:: |::: :: || I || 
ANFKCSSGYVVQDDWMGTLTVRENLQFSAALRLPKAMKTHEKNERINTIIKELGLDKVA 


182 


Db 


114 


173 


Qy 


183 


DQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQI VTjLLAELARR 

1 : I : 111111:111 : 1 : 1 1 : : 1 1 II 1 1 1 1 1 1 1 1 :: 1 1 1 : : : : 
DSKVGTQFTRGISGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQ 


242 


Db 


174 


233 


Qy 


243 


DRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFY 
1 : 1 : 1 1 1 1 1 : 1 : 1 1 : : 1 |:|:| 1 : : I : I : I I I : : I 1 1 I : 
GRTIIFSIHQPRYSIFKLFDSLTLLASGKLMFHGPAQKALEYFASAGYHCEPYNNPADFF 


302 


Db 


234 


293 


Qy 


303 


MDLTSVDTQS REREIETYKRVQMLECAFKESDI YHKILENI ERA 

: 1 : : 1 : : 1 : : 1 1 : 1 1 : 1 1 : 1 
LDVINGDSSAVMLNRGEQDHEANKTEE PSKREKPIIENLAEFYINSTIYGETK 


346 


Db 


294 


346 



Qy 



347 RYLKTLPMV PFKTKDP PGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLI 396 



I I I • II I • I -1 * I I • I I • I I I I 

Db 347 AELDQLPVAQKKKGSSPF— KEPVYVTSFCHQLRWIARRSFKNLLGNPQASVAQLIVTVI 404 

Qy 397 MGL FL I F Y LL RVQNNT L KGAVQ D RVG LL YQ LVGAT P YT GMLNAVN L F PML RAVS DQ E S Q D 456 

: I I : : : I : : I : I I : I : I : : I I I I : : : I 

Db 405 LGLIIGALYFGLKNDPT — GMQNRAGVFLFLTTNQCFTS-VSAVELFWEKKLFIHEYIS 4 61 

Qy 457 GLYHKWQMLLA-YVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGE 515 

II I : I I : : I I : : : I : I I I II I : : : 
Db 462 GYYRVSSYFFGKLVSDLLPMRFLPSVIYTCILYFMLGLKRLVEAFFIMRFTLI MVAY 518 

Qy 516 FLTLVLLGIVQNPNIVNSIVALLSIS — GLLIGSGFIRNIQEMPIPLKILGYFTFQKYCC 573 

: : I I ::|: |::|| ::: || : |:: : I III: :| 

Db 519 TASSMAIAI AAGQS WSVATLLMTI S FVFMMLFSGLLVNLRTI GPWLSWLQYFS I PRYGF 578 

Qy 574 EILWNEFYGLNFTCGGSNTSMLNHPMCAITQGVQFIEKTCPGATSRFTAN-FLILYG 630 

I I I I I I I I I : I I I : : I I : I I I 

Db 579 TALQHNEFLGQEF-CPGLNVTM NSTCVNSYTICTGNDYLINQG 620 



RESULT 14 
Q80ST1 

ID Q80ST1 PRELIMINARY; PRT; 657 AA. 

AC Q80ST1; 

DT 01-JUN-2003 (TrEMBLrel. 24, Created) 

DT 01-JUN-2003 (TrEMBLrel. 24, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ATP-binding cassette protein G2 transcript variant B (ATP-binding 

DE cassette protein G2 transcript variant C) (ATP-binding cassette 

DE protein G2 transcript variant A) . 

GN ABCG2 . 

OS Rattus norvegicus (Rat) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Rattus. 

OX NCBI_TaxID=10116; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Sprague-Dawley; TISSUE=Liver ; 

RA Yabuuchi H., Ishikawa T.; 

RL Submitted (MAR-2002) to the EMBL/ GenBank/DDBJ databases. 

DR EMBL; AY089996; AAM09106.1; 

DR EMBL; AY089997; AAM09107.1; -. 

DR EMBL; AY089998; AAM09108.1; -. 

DR GO; GO: 0016020; C:membrane ; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; F:nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA__ATPase . 

DR InterPro; IPR003439; ABC__tr an s porter . 

DR InterPro; IPR006162; Ppantne_S . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART ; SM00382; AAA; 1. 

DR PROSITE; PS50893; AB C__T RAN S PORT ER_2 ; 1. 

DR PROSITE; PS00012; PHOSPHOPANTETHEINE; 1. 

KW ATP-binding. 



SQ SEQUENCE 657 AA; 72960 MW; E194871E1C1AC201 CRC64; 



Query Match 19.5%; Score 657; DB 11; Length 657; 

Best Local Similarity 29.3%; Pred. No. 2e-40; 

Matches 192; Conservative 132; Mismatches 248; Indels 84; Gaps 21; 

Qy 4 LPFLSPEGARGPHINRGSLSSLEQGSWGTEARHSLGVLHVSYSVSNRVGPWWNIKSCQQ 63 

II : I III : I : I I I I :: I I : I : :: : 

Db 2 0 LPGMSSRGAR TLAEGDV LSFHHITYRVKVKSG — FLVRKTAE 59 

Qy 64 KWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRR 123 

I : I I I : : : : I : I I I : I I I : : I I I : : I I I I : I : I I • 

Db 60 K EILSDINGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPRG-LSGDVLINGAP-QP 113 

Qy 124 DQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRS-SADFYNKKVEAVMTELSLSHVA 182 

|: ||:| II : MINI |:::| II:: |::: :: || I II 

Db 114 ANFKCSSGYWQDDVVMGTLTVRENLQFSAALRLPKAMKTHEKNERINTIIKELGLDKVA 173 

Qy 183 DQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELARR 242 

I : | : 111111:111 : I : M : : I II I I I II I I I I :: I I I : : : : 

Db 174 DSKVGTQFTRGISGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQ 233 

Qy 243 DRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFY 302 

I : I : I I I I I : I : I I : : I I : I : I I : : I : I : I I I : : I I I I : 
Db 234 GRTI I FS IHQPRYS I FKLFDSLTLLASGKLMFHGPAQKALEYFASAGYHCEPYNNPADFF 293 

Qy 303 MDLTSVDTQS REREIETYKRVQMLECAFKESDI YHKILENI ERA 346 

:|: : I: : |:: I I : I |:||: I 

Db 294 LDVINGDS SAVMLNRGEQDHEANKTEE PSKREKPIIENLAEFYINSTIYGETK 346 

Qy 347 RYLKT L PMVP FKT KDP P GMFGKL GVL L RRVT RN LMRN KQAVI MRLVQN L I MG 398 

| ||: | : : I : I : I I : I I : I I I : : I : : I : I 

Db 347 AELDQLPVAQKKKGSSAFREPVYVTSFCHQLRWIARRSFKNLLGNPQASVAQLIVTVILG 406 

Qy 399 L FL I FYL L RVQNNT LKGAVQ D RVGL L YQ LVGAT P YT GMLNAVNL F PMLRAVS DQESQDGL 458 

| : : : | : : I : I I : : I : I : : I I I I : : : I I 

Db 407 LIIGALYFGLKNDPT — GMQNRAGVFFFLTTNQCFTS-VSAVELFWEKKLFIHEYISGY 463 

Qy 459 YHKWQMLLA-YVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFL 517 

I I : I I : : I I : : : I : I I I II I : : : 
Db 464 YRVSSYFFGKLVSDLLPMRFLPSVIYTCLLYFMLGLKRTVEAFFIMMFTLI MVAYTA 520 

Qy 518 TLVLLGIVQNPNIVNSIVALLSIS— GLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEI 575 

: : I I : : I : I : : I I : : : I I : I : : : I III: : I 

Db 521 SSMAIAIAAGQSWSVATLLMTISFVFMMLFSGLLWLRTIGPWLSWLQYFSIPRYGFTA 5 80 

Qy 576 LWNEFYGLNFTCGGSNTSMLNHPMCAITQGVQFIEKTCPGATSRFTAN-FLILYG 630 

111111111:1 I I : : I I : I I I 

Db 581 LQHNEFLGQEF-CPGLNVTM NSTCVNSYTICTGNDYLINQG 620 



RESULT 15 
Q8T691 

ID Q8T691 PRELIMINARY; PRT; 801 AA. 

AC Q8T691; 

DT 01-JUN-2002 (TrEMBLrel. 21 f Created) 

DT 01-JUN-2002 (TrEMBLrel. 21, Last sequence update) 



DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ABC transporter AbcGl. 

GN ABCG1 . 

OS Dictyostelium discoideum (Slime mold) . 

OC Eukaryota; Mycetozoa; Dictyosteliida; Dictyostelium. 

OX NCBIJTaxI D=4 4 689; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Ax4 ; 

RA Anjard C, Loomis W.F.; 

RT "Evolution of the ABC transporters of Dictyostelium."; 

RL Submitted (FEB-2002) to the EMBL/ GenBank/DDBJ databases. 

CC -!- SIMILARITY: BELONGS TO THE ABC TRANSPORTER FAMILY. 

DR EMBL; AF482380; AAL91485.1; -. 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO:0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti . . .; IEA. 

DR GO; GO:0000166; F: nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC_TRANSPORTER_l ; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

KW ATP-binding; Transport. 

SQ SEQUENCE 801 AA; 90052 MW; CCC4F0036CB195A3 CRC64; 

Query Match 19.0%; Score 64 0; DB 5; Length 801; 

Best Local Similarity 27.5%; Pred. No. 4.7e-39; 

Matches 188; Conservative 126; Mismatches 249; Indels 120; Gaps 19; 

KSCQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNG 118 

I : : I : I I I : : : : I I I I I I : I I I : I I I I I I I : : I I I : I : : : I I 
KGKKKKISKQILTNINGHIESGTIFAIMGPSGAGKTTLLDILAHRLNINGS— GTMYLNG 185 

CELRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRSSADFYNK — KVEAVMTEL 176 

: : |: Mill : I I I I I I I I : I I : I I :|: :: I: 

NKSDFNIFKKLCGYVTQSDSLMPSLTVRETLNFYAQLKMPR-DVPLKEKLQRVQDIIDEM 244 

SLSHVADQMIGSYN — FGGISSGERRRVSIAAQLLQDPKYMMLDEPTTGLDCMTANQIVL 234 
I : | | : : | : : II I M I I I I : I : : I I I I :: I 1 I I I : I I I : : : 



I : I I : I : I I I I I I I I :: I I : : I I " :: I : I : I I I I I I 



INPFDFYMDL — TSVDTQS RE RE I 317 

I I I I :: I I I I : I : III 



Qy 


59 


Db 


128 


Qy 


119 


Db 


186 


Qy 


177 


Db 


245 


Qy 


235 


Db 


305 


Qy 


295 


Db 


365 


Qy 


318 


Db 


425 



Qy 349 LKTLPMV PFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLF 400 

: | | : : : I : : I I I I I : I : I : I : I I 

Db 483 NETLDNI SKENRTDFKYEKTRGPNFLTQFSLLLGREVTNAKRHPMAFKVNLIQAI FQGLL 542 

Qy 4 01 — LI F YL LRVQNNT L KGAVQ DRVGLL YQ LVGAT P YT GMLNAVNL F PMLRAVS DQESQDGL 458 

: : : | I : : : I I I I : : : : : :::::: I I : : : : I : 

Db 543 CGIVYYQLGLG Q S S VQ S RT G WAF I IMGVS F P AVMS T I HVF P D VI T I FLK D RAS GV 598 

Qy 459 YHKWQMLLAYVLHVLPFSVIATVIFSSVCYWTLG LYPEVARFGYFSAALLAPHLI 513 

| || : | :::::: I I I I : : I I 
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Qy 514 GEFLTLVLLGIVQN PNI-VNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQ 569 

I : I I :: : M : I : I I I : : I I I I : :: I I I : I 

Db 657 TCLSLGVLISSSVPNVQVGTAVAPLIVILFFLFSGFFINLNDVPGWLVWFPYISFF 712 

Qy 570 KYCCEI LWNEFYGLNFTC GGSNTSMLNHPMCAITQGVQFIEKTCPGATSRFTA 623 

: I I I : I I : : I I I II : I : I II I 

Db 713 RYMI EAAVI NAFKDVH FT CT DSQKI GG VCPVQYGNNVI E-NMGYDI DHFWR 762 

Qy 624 NFLILYGFIPALVILGIVIFKVR 64 6 

I I I :| :| :: I:: 

Db 763 NVWI LVLYI I GFRVLT FLVLKLK 785 
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Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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ALIGNMENTS 



RESULT 1 
ABG5_MOUSE 

ID ABG5_M0USE STANDARD; PRT; 652 AA. 

AC Q99PE8; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 5 (Sterolin-1) . 

GN ABCG5 . 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=C57BL/6; TISSUE=Liver ; 

RX MEDLINE=20578753; PubMed-11138003 ; 

RA Lee M.-H., Lu K., Hazard S., Yu H., Shulenin S., Hidaka H., Kojima H., 

RA Allikmets R. , Sakuma N., Pegoraro R. , Srivastava A.K., Salen G., 

RA Dean M., Patel S.B.; 

RT "Identification of a gene, ABCG5, important in the regulation of 

RT dietary cholesterol absorption."; 

RL Nat. Genet. 27:79-83(2001). 

RN [2] 



RP TISSUE SPECIFICITY, AND INDUCTION. 

RX MEDLINE-20553648; PubMed=11099417 ; 

RA Berge K.E., Tian H. , Graf G.A., Yu L., Grishin N.V., Schultz J., 

RA Kwiterovich P., Shan B., Barnes R. , Hobbs H.H.; 

RT "Accumulation of dietary cholesterol in sitosterolemia caused by 

RT mutations in adjacent ABC transporters."; 

RL Science 290:1771-1775 (2000) . 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG8 or be tightly coupled to 
CC ABCG8 along a pathway regulating diatery-sterol absorption and 

CC excretion (By similarity) . 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- TISSUE SPECIFICITY : Expressed in the intestine and, at lower 
CC level, in the liver. 

CC -!- INDUCTION: Upregulated by cholesterol feeding. Possibly mediated 
CC by the liver X receptor/retinoic X receptor (LXR/RXR) pathway. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC ~ 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 
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SQ SEQUENCE 652 AA; 73244 MW; 80CE37ADCC19771E CRC64 ; 



Query Match 99.8%; Score 3363; DB 1; Length 652; 

Best Local Similarity 99.8%; Pred. No. 7.8e-235; 

Matches 651; Conservative 0; Mismatches 1; Indels 0; Gaps 0; 
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RESULT 2 
ABG5 RAT 



ID ABG5_RAT STANDARD; PRT; 652 AA. 

AC Q99PE7; Q8CIQ4; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 10-OCT-2003 (Rel. 42, Last sequence update) 



DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 5 (Sterolin-1) . 

GN ABCG5 . 

OS Rattus norvegicus (Rat) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Rattus. 

OX NCBI_TaxID=10116; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Sprague-Dawley; TISSUE=Small intestine; 

RX MEDLINE=20578753; PubMed=11138003; 

RA Lee M.-H., Lu K., Hazard S., Yu H w Shulenin S., Hidaka H., Kojima H., 

RA Allikmets R., Sakuma N . , Pegoraro R. , Srivastava A.K., Salen G., 

RA Dean M. , Patel S.B.; 

RT "Identification of a gene, ABCG5, important in the regulation of 

RT dietary cholesterol absorption."; 

RL Nat. Genet. 27:79-83(2001). 

RN [2] 

RP REVISION TO 2. 

RA Lu K., Lee M.-H., Patel S.B.; 

RL Submitted (AUG-2002) to the EMBL/ GenBank/DDBJ databases. 

RN [3] 

RP SEQUENCE FROM N.A. , TISSUE SPECIFICITY, AND VARIANT CYS-583. 

RC STRAIN=GH, SHR, SHRSP, Sprague-Dawley, Wistar, Wistar Kyoto, and WKA; 

RX PubMed=127 83625; 

RA Yu H., Pandit B . , Klett E., Lee M.H., Lu K., Helou K., Ikeda I., 

RA Egashira N., Sato M., Klein R. , Batta A., Salen G. , Patel S.B.; 

RT "The rat STSL locus: characterization, chromosomal assignment, and 

RT genetic variations in sitosterolemic hypertensive rats."; 

RL BMC Cardio vase. Disord. 3:4-4(2003). 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG8 or be tightly coupled to 
CC ABCG8 along a pathway regulating diatery-sterol absorption and 

CC excretion (By similarity) . 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- TISSUE SPECIFICITY: Expressed only in liver and intestine. 

CC -!- POLYMORPHISM: The polymorphism at position 583 is found in strains 

CC SHR, SHRSP and Wistar Kyoto which are both hypertensive and 

CC sitosterolemic. Strains which are hypertensive but not 

CC sitosterolemic do not contain a polymorphism at this position. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 

CC subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 
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DOMAIN 


551 


624 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


625 


645 


6 (POTENTIAL) . 


FT 


DOMAIN 


646 


652 


CYTOPLASMIC (POTENTIAL) . 


FT 


NP BIND 


87 


94 


ATP (POTENTIAL) . 


FT 


CARBOHYD 


585 


585 


N-LINKED (GLCNAC. . .) (POTENTIAL). 


FT 


CARBOHYD 


592 


592 


N-LINKED (GLCNAC. . .) (POTENTIAL). 


FT 


VARIANT 


583 


583 


G -> C (in strains SHR, SHRSP and Wistar 


FT 








Kyoto) . 


SQ 


SEQUENCE 


652 AA 


; 73372 


MW; 49FEF7372269299D CRC64; 



Query Match 93.3%; Score 3144; DB 1; Length 652; 

Best Local Similarity 92.8%; Pred. No. 4.8e-219; 

Matches 605; Conservative 25; Mismatches 22; Indels 0; Gaps 0; 

1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

I I I I I I II I I I I I I I I I I I I I I I : I I I I I : I I M I I I I I : I I : II I I I I I II I I I I I 
1 MSELPFLSPEGARGPHNNRGSQSSLEEGSVTGSEARHSLGVXNVSFSVSNRVGPWWNIKS 60 

61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

I I I I I I I : I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
61 CQQKWDRKI LKDVS LYI ES GQTMCI LGS S GS GKTTLLDAI SGRLRRTGTLEGEVFVNGCE 120 

121 LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMI^CRSSADFYNKKV^VMTELSLSH 180 

I I I I I I I I I I I : I I I I I I I I I I I I I II I I II I I I I I I II I I I : I II I I I : I I I I I I I 
121 LRRDQFQDCVSYLLQSDVFLSSLTVRETLRYTAMLALRSSSADFYDKKVEAVLTELSLSH 180 

181 VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELA 240 

I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
181 VADQMIGNYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANHI VTjLLVELA 240 

241 RRDRI VTVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
241 RRNRIVIVTIHQPRSELFHHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I : : I I I I I I I I I I I I I : I I I I I I I I I I I I 
301 FYMDLTSVDTQSREREIETYKRVQMLESAFRQSDICHKILENIERTRHLKTLPMVPFKTK 360 

361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 
: I I I I I 111111111111111111 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 



Db 



361 NPPGMFCKLGVLLRRVTRNLMRNKQWIMRLVQNLIMGLFLIFYLLRVQNNMLKGAVQDR 420 



Qy 421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I :: I I 
Db 421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYQKWQMLLAYVLHALPFSIVAT 4 80 

Qy 4 81 VI FS SVCYWTLGLYPEVARFGYFSAALLAPHLI GEFLTLVLLGI VQNPNIVNS I VALLS I 540 

I I I I I II I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I II I I I : I I I I I I I I I I I I I I I I 
Db 4 81 VI FS SVCYWTLGLYPEVARFGYFSAALLAPHLI GEFLTLVLLGMVQNPNIVNS I VALLS I 540 

Qy 541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 

I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I II I I I M I I II I I I I I I I I I I : I : I I 
Db 541 SGLLIGSGFIRNIEEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSVPNNPM 600 

Qy 601 CAI TQGVQFI EKTCPGAT S RFTANFLI LYGFI PALVI LGI VI FKVRD YLI S R 652 

I ::| I I :| I II I I I I I I I I I I I Mill III I I I II : I : I I I I I I I I I I 
Db 601 CSMTQGIQFI EKTCPGATSRFTTNFLILYSFI PTLVI LGMWFKVRDYLI SR 652 



RESULT 3 
ABG5_HUMAN 

ID ABG5_HUMAN STANDARD; PRT; 651 AA. 

AC Q9H222; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 5 (Sterolin-1) . 

GN ABCG5 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID=9606; 

RN [1] 

RP SEQUENCE FROM N.A., AND VARIANT GLU-604 . 

RC TISSUE=Liver; 

RX MEDLINE=2 055364 8; PubMed=11099417 ; 

RA Berge K.E., Tian H., Graf G.A. , Yu L., Grishin N.V. , Schultz J., 

RA Kwiterovich P., Shan B., Barnes R. , Hobbs H.H.; 

RT "Accumulation of dietary cholesterol in sitosterolemia caused by 

RT mutations in adjacent ABC transporters."; 

RL Science 290:1771-1775(2000). 

RN [2] 

RP SEQUENCE FROM N.A. , VARIANTS SITOSTEROLEMIA HIS-389; HIS-419 AND 

RP PRO-419, AND VARIANT GLU-604. 

RC TISSUE=Liver; 

RX MEDLINE-20578753; PubMed-11138003 ; 

RA Lee M.-H., Lu K. , Hazard S., Yu H., Shulenin S., Hidaka H., Kojima H . , 

RA Allikmets R. , Sakuma N., Pegoraro R., Srivastava A.K., Salen G., 

RA Dean M. , Patel S.B.; 

RT "Identification of a gene, ABCG5, important in the regulation of 

RT dietary cholesterol absorption."; 

RL Nat. Genet. 27:79-83(2001). 

RN [3] 

RP REVIEW. 

RX MEDLINE-21474438; PubMed=11590207 ; 

RA Schmitz G. , Langmann T., Heimerl S.; 

RT "Role of ABCG1 and other ABCG family members in lipid metabolism."; 



RL J. Lipid Res. 42:1513-1520(2001). 

RN [4] 

RP VARIANTS SITOSTEROLEMIA GLN-146; HIS-389; PRO-419; HIS-419 AND 

RP SER-550, AND VARIANT GLU-604. 

RX MEDLINE=21344600; PubMed=11452359 ; 

RA Lu K., Lee M.-H., Hazard S., Brooks-Wilson A., Hidaka H., Kojima H. , 

RA Ose L., Stalenhoef A.F.H., Mietinnen T., Bjorkhem I., Bruckert E., 

RA Pandya A. , Brewer H.B. Jr., Salen G., Dean M. , Srivastava A.K., 

RA Patel S.B.; 

RT "Two genes that map to the STSL locus cause sitosterolemia : genomic 

RT structure and spectrum of mutations involving sterolin-1 and 

RT sterolin-2, encoded by ABCG5 and ABCG8 , respectively."; 

RL Am. J. Hum. Genet. 69:278-290(2001). 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG8 or be tightly coupled to 
CC ABCG8 along a pathway regulating diatery-sterol absorption and 

CC excretion. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- TISSUE SPECIFICITY: Strongly expressed in the liver, lower levels 

CC in the small intestine and colon. 

CC -!- DISEASE: Defects in ABCG5 are a cause of sitosterolemia 

CC [MIM: 210250] ; also known as phytosterolemia or shellfish 

CC sterolemia. It is a rare autosomal recessive disorder 

CC characterized by increased intestinal absorption of all sterols 

CC including cholesterol, plant and shellfish sterols, and decreased 

CC biliary excretion of dietary sterols into bile. Sitosterolemia 

CC patients have hypercholesterolemia, very high levels of plant 

CC sterols in the plasma, and frequently develop tendon and tuberous 

CC xanthomas, accelerated atherosclerosis and premature coronary 

CC artery disease. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; AF320293; AAG40003.1; -. 

DR EMBL; AF312715; AAG53099.1; -. 

DR Genew; HGNC: 13886; ABCG5 . 

DR MIM; 605459; -. 

DR MIM; 210250; -. 

DR GO; GO:0030299; P : cholesterol absorption; NAS . 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; AB C_T RAN S PORT ER_1 ; FALSE_NEG. 

DR PROSITE; PS50893; ABC_TRANSPORTER_2 ; 1. 
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Disease mutation. 
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TRANSMEM 


A O O 

422 


442 


O / DA r PT?UT'T7\T \ 


FT 


DOMAIN 
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44.3 


462 


tl lUFLAbMlU { rUl EN 1 1AJU ) . 


FT 


TRANSMEM 


463 


483 


0 [ rUl LN i ±AL J . 


FT 


DOMAIN 


484 


503 


EXI RAULLLULAK (POIEN1IAL) . 


FT 


TRANSMEM 


504 


524 


4 (POIENIIAL) . 


FT 


DOMAIN 


525 


528 


CYIOPLAbMlC ( FOI EN I ±AL ) . 


FT 


TRANSMEM 


529 


549 


d (PO I EN HAL) . 


FT 


DOMAIN 


550 


623 


EAl RAOLLLULAK v FU1 hiJM 1 1AL j . 


FT 


TRANSMEM 


624 


644 


0 (POTENTIAL) . 


FT 


DOMAIN 


645 


651 


{-* \/m /-\ -pi t 7\ OA/f T r* / Tl^rp T7»vrrp T A T \ 

CYIOrLAbMlC (POILNHAL) . 


FT 


NP_BIND 


86 


93 


ATP (POTENTIAL) . 


FT 


CARBOHYD 


584 


584 


N-LINKLD (CjLCJMAL. . . ) (PU1EN11AL) . 


FT 


CARBOHYD 


591 


591 


it x"T ITT/ T~» T\ / /"« T /""'♦XT TV \ / ■piy^vm TT'TiTfTI T 7\ T \ 

N-LINKED (GLCNAC. . . ) (POIENI1AL) . 


FT 


VARIANT 


146 


146 


E — > Q (in sitos terolemia ) . 


FT 








/FTIa=VAR 012244. 


FT 


VARIANT 


389 


389 


R -> H (in sitosterolemia) . 


FT 








/FTIa^VAR 012245. 


FT 


VARIANT 


419 


419 


R -> H (in sitosterolemia) . 


FT 








/FTIa— VAR U1224 6. 


FT 


VARIANT 


419 


419 


R -> P (in sitosterolemia) . 


FT 








/Mlu- VAK Ulzz4 / . 


FT 


VARIANT 


550 


550 


R -> S (in sitosterolemia) . 


FT 








/FTId=VAR_012248 . 


FT 


VARIANT 


604 


604 


Q -> E. 


FT 








/FTId=VAR_012249 . 


SQ 


SEQUENCE 


651 AA; 


72503 


MW; 950BABFCBB6A1536 CRC64; 



Query Match 81.5%; Score 2744.5; DB 1; Length 651; 

Best Local Similarity 80.2%; Pred. No. 3e-190; 

Matches 523; Conservative 64; Mismatches 64; Indels 1; Gaps 1; 

Qy 1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

I I : I I : I I : I : I I I I I I I I I I I I I I : I I I I I I I : I I I I I : I I 

Db 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEP-HSLGILHASYSVSHRVRPWWDITS 59 

Qy 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

I : I : I I I I I I I I I I I : I I I I I I I I I II I I I I I I I I I I I : I I I I I II I I I : I I I 
Db 60 CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRA 119 

Qy 121 LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRSSADFYNKKVEAVMTELSLSH 180 

I M : I I I I I I I I I I I I I I I I I I I I I I I llhll: I : : I I I I I I I I I II I I 
Db 120 LRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAIRRGNPGSFQKKVEAVMAELSLSH 179 

Qy 181 VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMVILDEPTTGLDCMTANQIVLLIiAELA 240 

I I I : : I I : I : I I I I : I I I I I I I I I I I I I I I I I I I : I I II I I I II II I I I I I : I I III 
Db 180 VADRLIGNYSLGGISTGERRRVSI7\AQLLQDPKVKLFDEPTTGLDCMTANQIVVljLVELA 239 

Qy 241 RRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

M : I I I :: I I I I I I I I I I I I I I I I I I : : I I I : I I I I I III I I I : I I I I I I I I I I I I I 
Db 240 RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 



Qy 



301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 



I I I I I I I I I I I I : I I I I I I I I I I I : I I : I : I I II I : I I I I :: I I I M II I I I I I 

Db 300 FYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTK 359 

Qy 361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

| | | : I I I I I I I I I I I I I I : I I I III I I : I I I I I I I I I : I : : I I I : : I I I II : I I I 
Db 360 DSPGVFSKLGVXLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVIjRVRSNVLKGAIQDR 419 

Qy 421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 

MINI I I I I I I I I I I I I I I I I I : I I I I I I I I I I M I I 1111:111 I I I I I I I I : I I 
Db 420 VGLLYQHVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWAT 479 

Qy 481 VI FS SVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNI WS I VALLS I 540 

: I I I I I I I I I I I I : I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I : I I I I I I 
Db 480 MI FS SVCYWTLGLHPEVARFGYFSAALLAPHLI GEFLTLVLLGIVQNPNI WSVVALLSI 539 

Qy 541 SGLLI GSGFI RNIQEMPI PLKI LGYFTFQKYCCEI LVVNEFYGLNFTCGGSNTSMLNHPM 600 

: I : I : I I I I : I I I I I I I I I II: I I I I I I I I I I I I I I I I I I I I I II I II I : : I I 
Db 540 AGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPM 599 

Qy 601 CAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVI FKVRDYLISR 652 

II I I I : I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I : I I : I I : I I I I 

Db 600 CAFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 



RESULT 4 
ABG8_RAT 

ID ABG8_RAT STANDARD; PRT; 694 AA. 

AC P58428; Q8CIQ5; Q923R7; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 15-MAR-2004 (Rel. 43, Last sequence update) 

DT 15-MAR-2004 (Rel. 43, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 8 (Sterolin-2) . 

GN ABCG8 . 

OS Rattus norvegicus (Rat) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Rattus. 

OX NCBI_TaxID=10116; 

RN [1] 

RP SEQUENCE FROM N.A. (ISOFORMS 1 AND 2). 

RC STRAIN=Sprague-Dawley; 

RX MEDLINE=21344 600; PubMed=11452359 ; 

RA Lu K., Lee M.-H., Hazard S., Brooks-Wilson A., Hidaka H., Kojima H., 

RA Ose L., Stalenhoef A.F.H., Mietinnen T . , Bjorkhem I., Bruckert E., 

RA Pandya A., Brewer H.B. Jr., Salen G., Dean M. , Srivastava A.K., 

RA Patel S.B. ; 

RT "Two genes that map to the STSL locus cause sitosterolemia : genomic 

RT structure and spectrum of mutations involving sterolin-1 and 

RT sterolin-2, encoded by ABCG5 and ABCG8 , respectively."; 

RL Am. J. Hum. Genet. 69:278-290(2001). 

RN [2] 

RP REVISIONS TO 3-4. 

RA Lu K., Yu H., Lee M.-H., Patel S.B.; 

RL Submitted (AUG-2 002) to the EMBL/ GenBank/DDBJ databases. 

RN [3] 

RP SEQUENCE FROM N.A. (ISOFORMS 1 AND 3), AND TISSUE SPECIFICITY. 

RC STRAIN=GH, SHR, SHRSP, Sprague-Dawley, Wistar, Wistar Kyoto, and WKA; 

RC TISSUE^Intestine, and Liver; 



RX PubMed=1278 3625; 

RA Yu H., Pandit B., Klett E., Lee M.-H., Lu K. , Helou K., Ikeda I., 

RA Egashira N. , Sato M. , Klein R. , Batta A. f Salen G., Patel S.B.; 

RT "The rat STSL locus: characterization, chromosomal assignment, and 

RT genetic variations in sitosterolemic hypertensive rats."; 

RL BMC Cardiovasc. Disord. 3:4-4(2003). 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG5 or be tightly coupled to 
CC ABCG5 along a pathway regulating diatery-sterol absorption and 

CC excretion (By similarity) . 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- ALTERNATIVE PRODUCTS: 

CC Event=Alternative splicing; Named isoforms=3; 

CC Name=3; 

CC IsoId=P58428-3; Sequence=Displayed; 

CC Name=l; 

CC IsoId=P58428-l; Sequence-VSP_008767 ; 

CC Name=2 ; 

CC IsoId=P58428-2; Sequence=VSP_008767 , VSP_000054; 

CC Note=No experimental confirmation available; 

CC -!- TISSUE SPECIFICITY: Highest expression in liver, with lower levels 
CC in small intestine and colon. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC -!- CAUTION: Seems to have a defective ATP-binding region. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC — 

DR EMBL; AF351785; AAK84831.2; -. 

DR EMBL; AY145899; AAN64276.1; 

DR EMBL; AF404109; AAK85393.1; -. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; ABC_TRANSP0RTER_2 ; 1. 

KW Glycoprotein; Transmembrane; Transport; Alternative splicing. 

FT DOMAIN 1 434 CYTOPLASMIC (POTENTIAL) . 

FT TRANSMEM 435 455 1 (POTENTIAL) . 

FT DOMAIN 456 468 EXTRACELLULAR (POTENTIAL) . 

FT TRANSMEM 4 69 489 2 (POTENTIAL) . 

FT DOMAIN 490 517 CYTOPLASMIC (POTENTIAL) . 

FT TRANSMEM 518 538 3 (POTENTIAL) . 

FT DOMAIN 539 547 EXTRACELLULAR (POTENTIAL) . 

FT TRANSMEM 548 568 4 (POTENTIAL) . 

FT DOMAIN 569 590 CYTOPLASMIC (POTENTIAL) . 
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FT 


VARSPLIC 


56 


77 


Missing (in isoform 1 and isoform 2) 


FT 








/FTId=VSP_008767. 


FT 


VARSPLIC 


O Z) o 


■J ZJ KJ 


Mi^^inei (in i so form 2 ) . 

1 ±_J, k-J k_> -1- llu \ -L. XX J- O \h/ ,X_ Vh/ ,X- XII u / ■ 


FT 








/FTId=VSP_000054. 


FT 


CONFLICT 


3 


4 


EK -> QT (IN REF. 3) . 


SQ 


SEQUENCE 


694 AA; 


78236 


MW; 67F67C195F417587 CRC64; 


Query Match 




21.4% 


; Score 720.5; DB 1; Length 694; 



Best Local Similarity 30.6%; Pred. No. 2.7e-44; 
Matches 196; Conservative 122; Mismatches 249; Indels 73; Gaps 19; 

Qy 20 GSLSSLEQGSVT GTEARH-SLGVLHVSYSVSNRVGPW -WNI 58 

| : : I I : I I I I I I I I I : : : : I I I I 

Db 42 GQSNTLEWDLTYQGGTCLRSWGQEDPHMSLG-LSESVDMASQV- PWFEQLAQFKLPWRS 99 

Qy 59 KSCQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNG 118 

: | || I :::| : III:: |:||:| |: MM MM |::::M 
Db 100 RGSQDSWDLGI-RNLSFKVRSGQMLAIIGSAGCGRATLLDVITGRDHGGKMKSGQIWING 158 

Qy 119 CELRRDQFQDCFSWLQSDVFLSSLTWETLRYTAMl^CRS-SADFYNKKVEAVMTELS 177 

I I :: I I I | : | || I I I I : I : I :: I : I : II I : M 
D b 159 QPSTPQLIQKCVAHVRQQDQLLPNLTVRETLTFIAQMRLPKTFSQAQRDKRVEDVIAELR 218 

Qy 178 LSHVADQMIGSWFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLA 237 

| | : : I : I : I I II M M I I II : I : : : I M M : M I I M : I M 
Db 219 LRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVRTLS 278 

Qy 238 ELARRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSN 297 

| | : : | : I : : : : I M M : : M II : : : I I : : I : M : I : I I I I I : II 
Db 279 RLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGVAQHMVQYFTSIGYPCPRYSN 338 

Qy 298 P FD FYMDLT S VDTQ S RERE I ET YKRVQMLECAFKE SDIYHKI-LENIERARYLKT 351 

I II I : II I I : I : M I : I : I : : : : I II I I — = I : 

Db 339 PADFYVDLT S I DRRS KEQEVATMEKARLLAALFLEKVQGFDDFLWKAEAKS LDTGT YAVS 398 

Qy 352 LPMVPFKTKDP P GMFGKLGVLL RRVT RNLMRN KQAVI MRLVQN L I MGL FL I F 403 

: | : | III : MM I M : : : ' : I I : I 

Db 399 QTL TQDTNCGTAAELPGMIQQFTTLIRRQISNDFRDLPTLFIHGAEACLMSLIIGF 454 

Qy 404 YLLRVQNNTLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQ 4 63 

: | : | ||:: I : : I : I : I :: I : M M 

Db 455 LYYGHADKPL— SFMDMAALLFMIGALIPFNVILDWSKCHSERSLLYYELEDGLYTAGP 512 

Qy 464 MLLAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFL 517 

I M I I : M II II I I : : I 

Db 513 YFFAKVLGELPEHCAYVIIYGMPIYWLTNLRP GPELFLLHFMLLWLWFCCR 564 

Qy 518 TLVLLGIVQNPNI-VNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEIL 576 

|:| I ::| : : :||: |: : I : M ::| I 

Db 565 TMALAASAMLPTFHMS S FCCNALYNS FYLTAGFMINLNNLWI VPAWI SKMS FLRWCFSGL 624 

Qy 577 WNEFYG LNFTCGGSN— TSM-LN-HPMCAI 603 



: : | I II: I I : I I I I I : II 

Db 625 MQIQFNGHIYTTQIGNLTFSVPGDAMVTAMDLNSHPLYAI 664 



RESULT 5 
ABG8_M0USE 

ID ABG8_MOUSE STANDARD; PRT; 673 AA. 

AC Q9DBM0; 

DT ' 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 8 (Sterolin-2 ) . 

GN ABCG8 . 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. 

OX NCBI_TaxID= 10090; 

RN [1] 

RP SEQUENCE FROM N.A. (ISOFORMS 1 AND 2) . 

RC STRAIN=C57BL/6; TISSUE=Liver ; 

RX MEDLINE=21344600; PubMed=11452359; 

RA Lu K., Lee M.-H., Hazard S., Brooks-Wilson A., Hidaka H., Kojima H., 

RA Ose L., Stalenhoef A.F.H., Mietinnen T., Bjorkhem I., Bruckert E . , 

RA Pandya A., Brewer H.B. Jr., Salen G., Dean M. , Srivastava A.K., 

RA Patel S.B.; 

RT "Two genes that map to the STSL locus cause sitosterolemia : genomic 

RT structure and spectrum of mutations involving sterolin-1 and 

RT sterolin-2, encoded by ABCG5 and ABCG8, respectively."; 

RL Am. J. Hum. Genet. 69:278-290(2001). 

RN [2] 

RP SEQUENCE FROM N.A. (ISOFORM 1). 

RC STRAIN-C57BL/6J; TISSUE-Liver ; 

RX MEDLINE=21085660; PubMed=11217851; 

RA Kawai J., Shinagawa A., Shibata K. , Yoshino M. , Itoh M., Ishii Y. , 

RA Arakawa T., Hara A., Fukunishi Y., Konno H., Adachi J., Fukuda S., 

RA Aizawa K. , Izawa M. , Nishi K., Kiyosawa H., Kondo S., Yamanaka I., 

RA Saito T . , Okazaki Y., Gojobori T., Bono H., Kasukawa T., Saito R. , 

RA Kadota K. f Matsuda H.A., Ashburner M. , Batalov S., Casavant T., 

RA Fleischmann W., Gaasterland T., Gissi C, King B., Kochiwa H., 

RA Kuehl P., Lewis S., Matsuo Y., Nikaido I., Pesole G., Quackenbush J., 

RA Schriml L.M., Staubli F., Suzuki R. , Tomita M. , Wagner L. , Washio T., 

RA Sakai K . , Okido T., Furuno M. , Aono H., Baldarelli R. , Barsh G., 

RA Blake J., Boffelli D., Bojunga N . , Carninci P., de Bonaldo M.F., 

RA Brownstein M.J., Bult C, Fletcher C, Fujita M. , Gariboldi M. , 

RA Gustincich S., Hill D., Hofmann M. , Hume D.A. , Kamiya M. , Lee N.H., 

RA Lyons P., Marchionni L., Mashima J., Mazzarelli J., Mombaerts P., 

RA Nordone P., Ring B., Ringwald M. , Rodriguez I., Sakamoto N . , 

RA Sasaki H., Sato K. , Schoenbach C, Seya T., Shibata Y., Storch K.-F., 

RA Suzuki H. f Toyo-oka K. , Wang K.H., Weitz C. , Whittaker C, Wilming L., 

RA Wynshaw-Boris A. f Yoshida K., Hasegawa Y., Kawaji H., Kohtsuki S., 

RA Hayashizaki Y. ; 

RT "Functional annotation of a full-length mouse cDNA collection."; 

RL Nature 409:685-690(2001). 
RN [3] 

RP TISSUE SPECIFICITY, AND INDUCTION . 

RX MEDLINE=20553648; PubMed=11099417 ; 

RA Berge K.E., Tian H., Graf G.A., Yu L., Grishin N.V., Schultz J., 



RA Kwiterovich P., Shan B., Barnes R. , Hobbs H.H. ; 

RT "Accumulation of dietary cholesterol in sitosterolemia caused by 

RT mutations in adjacent ABC transporters."; 

RL Science 290:1771-1775(2000). 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG5 or be tightly coupled to 
CC , ABCG5 along a pathway regulating diatery-sterol absorption and 

CC excretion (By similarity) . 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC - !- ALTERNATIVE PRODUCTS: 

CC Event=Alternative splicing; Named isof orms=2 ; 

CC Name=l ; 

CC IsoId=Q9DBM0-l; Sequence=Displayed; 

CC Name ^2 ; 

CC IsoId=Q9DBM0-2 ; Sequence=VSP_000053 ; 

CC Note=No experimental confirmation available; 

CC -!- TISSUE SPECIFICITY: Expressed in the intestine and, at lower 

CC level, in the liver. 

CC -!- INDUCTION: Upregulated by cholesterol feeding. Possibly mediated 
CC by the liver X receptor/retinoide X receptor (LXR/RXR) pathway. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC -!- CAUTION: Seems to have a defective ATP-binding region. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage .by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; AF324495; AAK84079.1; -. 

DR EMBL; AK004871; BAB23630.1; -. 

DR MGD; MGI: 1914720; Abcg8 . 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR PROSITE; PS00211; AB C_T RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; ABCJT RAN S PORT ER__2 ; 1. 

KW Glycoprotein; Transmembrane; Transport; Alternative splicing. 



FT 


DOMAIN 


1 


413 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


414 


434 


1 (POTENTIAL) . 


FT 


DOMAIN 


435 


447 


EXTRACELLULAR (POTENTIAL) 


FT 


TRANSMEM 


448 


468 


2 (POTENTIAL) . 


FT 


DOMAIN 


469 


496 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


497 


517 


3 (POTENTIAL). 


FT 


DOMAIN 


518 


526 


EXTRACELLULAR ( POTENTIAL) 


FT 


TRANSMEM 


527 


547 


4 (POTENTIAL) . 
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EXTRACELLULAR (POTENTIAL) 


FT 
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660 


6 (POTENTIAL) . 


FT 


DOMAIN 


661 


673 


CYTOPLASMIC (POTENTIAL) . 



FT CARBOHYD 619 619 N-LINKED (GLCNAC. . .) (POTENTIAL). 

FT VARSPLIC 377 377 Missing (in isoform 2) . 

FT / FT I d=VS P_0 0 0 0 5 3 . 

SQ SEQUENCE 673 AA; 75995 MW; 78012611A5DF2589 CRC64; 

Query Match 20.8%; Score 701.5; DB 1; Length 673; 

Best Local Similarity 29.1%; Pred. No. 6.1e-43; 

Matches 194; Conservative 131; Mismatches 245; Indels 97; Gaps 19; 

Qy 27 QGSVTGTEARHSLGVLHVSYS VSNRVGPW WNIKS 60 

I I : : I : : I I : : I I : : : : I I I I I 

Db 25 QDSLFSSESDNS LYFTYSGQSNTLEVRDLTYQVDIASQV-PWFEQLAQFKIPWRSHS 80 

Qy 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

I : | : : : I : I I I : : I : I I I I I : : I I I I : I I I : : : : I I 

Db 81 SQDSCELGI-RNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQP 139 

Qy 121 LRRDQ FQDC FS YVLQ S DVFLS S LT VRET LRYT AMLALCRS - SAD F YNKKVEAVMT EL S L S 17 9 

: I : : I I I I : I I I I I I I : I : I I : I : I : I I I : I I I 
Db 140 STPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLR 199 

Qy 180 HVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAEL 239 

| : : | : I : I I I I I II I I I I I : I : : : II I I I = I I I I I : : I 1=1 

Db 2 00 QCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRL 259 

Qy 240 ARRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPF 299 

| : : | : | : : : : I I I I I : : I : I I : : : I I : : I : : I : : I : I : I I I : I I I 
Db 260 AKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPA 319 

Qy 300 DFYMDLTSVDTQSREREI ETYKRVQMLECAFKE SDIYHKI-LENIERARYLKTLP 353 

I I I : I I I I : I : I : I I I : I : : I I I I I I : : : : : I 

Db 320 DFYVT)LTSIDRRSKEREVATVEKAQS]WUiFLEKVQGFDDFLWK7VE^KELNTSTHTVSLT 37 9 

Qy 354 MVPFKTKDP PGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYL 405 

: |:| III : hll I I: : :| I : I 

Db 380 L TQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGF — 433 

Qy 406 L RVQNNT L KGAVQD RVGLL YQ LVGAT P YT GMLN AVN L F PMLRAVS DQ E S QD GL YH KWQML 465 

| : : : | ||: : |: :|: |: |:: I Mill 

Db 434 LYYGHGAKQLS FMDTAALLFMI GAL I P FNVI LDWS KCHSERSMLYYELEDGL YTAGP YF 493 

Qy 466 ]^YVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSA7VLLAPHLIGEFL TL 519 

I : I II :h: II II I || | : : |: 

Db 494 FAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELF LL — HFLLVWLWFCCRTM 545 

Qy 520 VLLGIVQNPNI-VNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILVV 578 

| | ::| : : :| |: |: : I : :| ::| h 

Db 546 ALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQ 605 

Qy 579 NEFYGL NFTCGGSNT SML NHPMCA ITQGVQFIEKTCPGATSRFT 622 

: I I III :|: • I I : I I |: : 
Db 606 IQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISY 652 

Qy 623 ANFLILY 629 

I I I I 

Db 653 -GFLFLY 658 



RESULT 6 
ABG2_HUMAN 

ID ABG2_HUMAN STANDARD; PRT; 655 AA. 

AC Q9UNQ0; 095374/ Q9BY73; Q9NUS0; 

DT 16-OCT-2001 (Rel. 40, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 2 (Placenta-specific ATP- 

DE binding cassette transporter) (Breast cancer resistance protein) . 

GN ABCG2 OR ABCP OR BCRP OR BCRP1. 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID=9606; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Placenta; 

RX MEDLINE=99065313; PubMed=9850061 ; 

RA Allikmets R. , Schriml L.M., Hutchinson A., Romano-Spica V., Dean M. , 

RT "A human placenta-specific ATP-binding cassette gene (ABCP) on 

RT chromosome 4q22 that is involved in multidrug resistance."; 

RL Cancer Res. 58:5337-5339(1998). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Breast cancer; 

RX MEDLINE=9 9080071; PubMed-9 8 61027; 

RA Doyle L.A. , Yang W. , Abruzzo L.V., Krogmann T., Gao Y. , Rishi A.K., 

RA Ross D.D.; 

RT "A multidrug resistance transporter from human MCF-7 breast cancer 

RT cells."; 

RL Proc. Natl. Acad. Sci. U.S.A. 95:15665-15670(1998). 

RN [3] 

RP ERRATUM. 

RA Doyle L.A. , Yang W., Abruzzo L.V., Krogmann T., Gao Y., Rishi A.K., 

RA Ross D.D.; 

RL Proc. Natl. Acad. Sci. U.S.A. 96:2569-2569(1999). 

RN [4] 

RP SEQUENCE FROM N.A. 

RA Kage K., Tsukahara S., Sugiyama T., Asada Ishikawa E., Tsuruo T 

RA Sugimoto Y. ; 

RT "Breast cancer resistance protein constitutes a 140-kDa complex as 

RT homodimer . " ; 

RL Submitted (MAR-2001) to the EMBL/ GenBank/DDBJ databases. 

RN [5] 

RP SEQUENCE OF 198-655 FROM N.A. 

RC TISSUE=Placenta; 

RA Isogai T., Ota T., Hayashi K. , Sugiyama T., Otsuki T., Suzuki Y., 

RA Nishikawa T., Nagai K. , Sugano S., Shiratori A., Sudo H., 

RA Wagatsuma M. , Hosoiri T., Kaku Y., Kodaira H., Kondo H., Sugawara M 

RA Takahashi M. , Chiba Y., Ishida S., Murakawa K. , Ono Y., Takiguchi S 

RA Watanabe S., Kimura K. , Murakami K. , Ishii S., Kawai Y., Saito K., 

RA Yamamoto J., Wakamatsu A., Nakamura Y., Nagahari K., Masuho Y., 

RA Ninomiya K., Iwayanagi T.; 

RT "NEDO human cDNA sequencing project."; 

RL Submitted (FEB-2000) to the EMBL/ GenBank/DDBJ databases. 
RN [6] 



RP 
RX 
RA 
RT 
RL 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
KW 



- ! 
_ i 



REVIEW. 

MEDLINE=2 1474438; PubMed=l 1590207; 
Schmitz G., Langmann T . , Heimerl S.; 

"Role of ABCG1 and other ABCG family members in lipid metabolism."; 
J. Lipid Res. 42:1513-1520(2001). 

-!- FUNCTION: Xenobiotic transporter that appears to play a major role 
in the multidrug resistance phenotype of a specific MCF-7 breast 
cancer cell line. When overexpressed, the transfected cells become 
resistant to mitoxantrone, daunorubicin and doxorubicin, display 
diminished intracellular accumulation of daunorubicin, and 
manifest an ATP-dependent increase in the efflux of rhodamine 123. 
SUBCELLULAR LOCATION: Integral membrane protein (Probable) . 
SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
subfamily. 

This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 
the European Bioinf ormatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib . ch) . 

EMBL; AF103796; AAD09188.1 
EMBL; AF098951; AAC97367.1 
EMBL; AB056867; BAB39212.1 
EMBL; AK002040; BAA92050.1 
Genew; HGNC:74; ABCG2 . 
MIM; 603756; 
GO; GO: 0016021; 
GO; GO: 0005524; 
GO; GO: 0004009; 
GO; GO: 0005215; 
GO; GO: 0008559; 
GO; GO: 0009315; 
GO; GO: 0006810; 



C: integral to membrane; TAS . 
F: ATP binding; TAS. 

F:ATP-binding cassette (ABC) transporter acti. 
F: transporter activity; TAS. 

F:xenobiotic-transporting ATPase activity; TAS . 
P:drug resistance; TAS. 
P: transport; TAS. 



; TAS . 



InterPro; IPR003593; AAA_ATPase. 
InterPro; IPR003439; ABC_transporter . 
Pfam; PF00005; ABC_tran; 1. 
ProDom; PD000006; ABC_transporter ; 1. 
SMART; SM00382; AAA; 1. 

PROSITE; PS00211; ABC_TRANSPORTER_l ; FALSE_NEG. 
PROSITE; PS50893; AB C_T RAN S PORT ER_2 ; 1. 
ATP-binding; Transmembrane; Transport. 

CYTOPLASMIC (POTENTIAL) . 
POTENTIAL. 

EXTRACELLULAR (POTENTIAL) . 
POTENTIAL. 

CYTOPLASMIC (POTENTIAL) . 
POTENTIAL. 

EXTRACELLULAR (POTENTIAL) . 
POTENTIAL. 

CYTOPLASMIC (POTENTIAL) . 
POTENTIAL. 

EXTRACELLULAR (POTENTIAL) . 
POTENTIAL. 

CYTOPLASMIC (POTENTIAL) . 
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NP BIND 
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418 
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FT 
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FT 


CARBOHYD 
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N-LINKED (GLCNAC. . .) (POTENTIAL) 


FT 


CONFLICT 


z4 




~\T ^ 7\ /TM DT?T7* O A "NT Ft /1\ 


FT 


CONFLICT 


166 


166 


E -> Q (IN REF. 2 AND 4) . 


FT 


CONFLICT 


208 


208 


F -> S (IN REF. 1) . 


FT 


CONFLICT 


315 


316 


MISSING (IN REF. 5) . 


FT 


CONFLICT 


482 


482 


R -> T (IN REF. 2) . 


SQ 


SEQUENCE 


655 AA; 


72343 


MW; 89A6D3511DC5CCE0 CRC64; 



Query Match 20.5%; Score 690.5; DB 1; Length 655; 

Best Local Similarity 29.0%; Pred. No. 3.6e-42; 

Matches 181; Conservative 141; Mismatches 247; Indels 55; Gaps 16; 

Qy 25 LEQGSVTGTEARHS LGVLHVS YSVSNRVGPWWNI KSCQQKWDRQI LKDV 73 

: ||: I I I I :: | | : | |:: :::| | :: 

Db 12 VSQGNTNGFPATVSNDLKAFTEGAVLSFHNICYRVKLKSG FLPCRKPVEKEILSNI 67 

Qy 74 SLYI ESGQIMCI LGS SGSGKTTLLDAI SGRLRRTGTLEGEVFVNGCELRRDQFQDCFS YV 133 

: :: I : I I I : I I I : : I I I : : I :| I |:| :|| I |: I I 
Db 68 NGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDVLINGAP-RPANFKCNSGYV 124 

Qy 134 LQ S DVFL S S LT VRET LRYT AMLALCRS SAD F- YNKKVEAVMT ELS L S HVADQMI GS YN FG 192 

: I I I : : I I I I I I : : : I I I : : I : : : hill III : h 
Db 125 VQDDVVT^GTLTVRENLQFSA7\LRLATTMTNHEKNERINRVIEELGLDKVADSKVGTQFI^ 184 

Qy 193 GISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELARRDRIVIVTIHQ 252 

I: I I I h I I I : I : I I :: II I I I I I I I I I I :: I I I : : : : I : I : I M 
Db 185 GVSGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVT.LLLKRMSKQGRTIIFSIHQ 244 

Qy 253 PRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQ- 311 

II : |: I I : : I hi I : I I I : I : I I I : : I I I h : I : : h 

Db 245 PRYSIFKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTA 304 

Qy 312 -SREREIETYKRVQMLECAFKESDIYHKI . — LENI ERARYLKT 351 

: | | | : I :: : I : : : : I : I I : : : 

Db 305 VALNRE-EDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITV 363 

Qy 352 LPMVPFKTKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNN 411 

: : I : I : : I : I I : I I I I : : : : : : I I : : : h 

Db 364 FKEISYTT S FCHQLRWVS KRS FKNLLGNPQAS I AQI I VTWLGLVI GAI YFGLKND 419 

Qy 412 TLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVL- 470 

: : | : | | : | : | :: : : I I I I : : : I II I : I 

Db 420 ST — GIQNRAGVLFFLTTNQCFSS-VSAVELFWEKKLFIHEYISGYYRVSSYFLGKLLS 476 

Qy 471 HVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNI 530 

: I I : : : : | |: : I : I I I I : I I : : : : : I I : : 

Db 477 DLLPMRMLPSIIFTCIVYFMLGLKPKADAFFVMMFTLM MVAYSAS SMALAIAAGQSV 533 

Qy 531 VNSIVALLSIS — GLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTC 588 

|: h:| ::| M : |: : I III: :| I III I II I 

Db 534 VSVATLLMTICFVFMMIFSGLLVNLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNF-C 592 

Qy 589 GGSNTSMLNHPMCAITQGVQFIEK 612 

II: I I I ::: I 



Db 



593 PGLNATGNNPCNYATCTGEEYLVK 616 



RESULT 7 
ABG8 HUMAN 



ID ABG8_HUMAN STANDARD; PRT; 673 AA. 

AC Q9H221; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 8 (Sterolin-2) . 

GN ABCG8 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID=9606; 

RN [1] 

RP SEQUENCE FROM N . A. , VARIANTS SITOSTEROLEMIA THR-231; GLN-263; ARG-574 

RP AND ARG-596, AND VARIANT CYS-54. 

RX MEDLINE=20553648; PubMed=11099417 ; 

RA Berge K.E., Tian H. f Graf G.A. , Yu L . , Grishin N.V., Schultz J., 

RA Kwiterovich P., Shan B., Barnes R., Hobbs H.H.; 

RT "Accumulation of dietary cholesterol in sitosterolemia caused by 

RT mutations in adjacent ABC transporters."; 

RL Science 290:1771-1775(2000) . 

RN [2] 

RP SEQUENCE FROM N.A. (ISOFORMS 1 AND 2), VARIANTS SITOSTEROLEMIA 

RP HIS-184; THR-231; GLN-263; HIS-405; PRO-501; SER-543; PRO-572; 

RP GLU-574; ARG-574; ARG-596 AND PHE-570 DEL, AND VARIANTS HIS-19; 

RP CYS-54; LYS-238; VAL-259; LYS-400; ARG-575 AND ALA- 632 . 

RC TISSUE^Liver; 

RX MEDLINE=21344600; PubMed=11452359 ; 

RA Lu K., Lee M.-H., Hazard S., Brooks-Wilson A., Hidaka H., Kojima H., 

RA Ose L., Stalenhoef A.F.H., Mietinnen T., Bjorkhem I., Bruckert E., 

RA Pandya A., Brewer H.B. Jr., Salen G., Dean M, , Srivastava A.K., 

RA Patel S.B.; 

RT "Two genes that map to the STSL locus cause sitosterolemia: genomic 

RT structure and spectrum of mutations involving sterolin-1 and 

RT sterolin-2, encoded by ABCG5 and ABCG8, respectively."; 

RL Am. J. Hum. Genet. 69:278-290(2001). 

RN [3] 

RP REVIEW. 

RX MEDLINE=21474438; PubMed-11590207 ; 

RA Schmitz G., Langmann T., Heimerl S.; 

RT "Role of ABCGl and other ABCG family members in lipid metabolism."; 

RL J. Lipid Res. 42:1513-1520(2001). 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG5 or be tightly coupled to 
CC ABCG5 along a pathway regulating diatery-sterol absorption and 

CC excretion. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- ALTERNATIVE PRODUCTS: 

CC Event=Alternative splicing; Named isoforms=2; 

CC Name=l; 



CC IsoId=Q9H221-l; Sequence=Displayed; 

CC Name=2 ; 

CC IsoId=Q9H221-2; Sequence=VSP_000052 ; 

CC Note=Minor form detected in approximately 10% of the cDNA 

CC clones; 

CC -!- TISSUE SPECIFICITY: Strongly expressed in the liver, lower levels 
CC in the small intestine and colon. Detectable in a wide variety of 

CC human tissues. 

CC -!- DISEASE: Defects in ABCG8 are a cause of sitosterolemia 

CC [MIM: 210250] ; also known as phytosterolemia or shellfish 

CC sterolemia. It is a rare autosomal recessive disorder 

CC characterized by increased intestinal absorption of all sterols 

CC including cholesterol, plant and shellfish sterols, and decreased 

CC biliary excretion of dietary sterols into bile. Sitosterolemia 

CC patients have hypercholesterolemia, very high levels of plant 

CC sterols in the plasma, and frequently develop tendon and tuberous 

CC xanthomas, accelerated atherosclerosis and premature coronary 

CC artery disease. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC -!- CAUTION: Seems to have a defective ATP-binding region. 

CC ~ 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; AF320294; AAG40004.1; -. 

DR EMBL; AF324494; AAK84078.1; -. 

DR EMBL; AF351824; AAK84663.1; -. 

DR EMBL; AF351812; AAK84663.1; JOINED. 

DR EMBL; AF351813; AAK84663.1; JOINED. 

DR EMBL; AF351814; AAK84663.1; JOINED. 

DR EMBL; AF351815; AAK84663.1; JOINED. 

DR EMBL; AF351816; AAK84663.1; JOINED. 

DR EMBL; AF351817; AAK84663.1; JOINED. 

DR EMBL; AF351818; AAK84663.1; JOINED. 

DR EMBL; AF351819; AAK84663.1; JOINED. 

DR EMBL; AF351820; AAK84 663.1; JOINED. 

DR EMBL; AF351821; AAK84663.1; JOINED. 

DR EMBL; AF351822; AAK84663.1; JOINED. 

DR EMBL; AF351823; AAK84663.1; JOINED. 

DR Genew; HGNC: 13887; ABCG8 . 

DR MIM; 605460; -. 

DR MIM; 210250; -. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; ABC_TRANSPORTER_2 ; 1. 

KW Glycoprotein; Transmembrane; Transport; Alternative splicing; 

KW Polymorphism; Disease mutation. 
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L — > P (in sitosterolemia) . 
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/FTIa— VAK UlZzoy. 
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VARIANT 
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R — > S (in sitosterolemia) . 
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/FTIa— VAK Ul^zbU. 
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/FTTd=VAR 0122 65. 


FT 


VARIANT 


596 
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L -> R (in sitosterolemia) . 


FT 








/FTId=VAR_012266. 


FT 


VARIANT 


632 


632 


V -> A. 


FT 








/FTId=VAR_012267 . 


SQ 


SEQUENCE 


673 AA; 


75678 


MW; 594AFD1D6C1BB50F CRC64; 



Query Match 2 0.4%; Score 688.5; DB 1; Length 673; 

Best Local Similarity 28.1%; Pred. No. 5.2e-42; 

Matches 188; Conservative 125; Mismatches 233; Indels 123; Gaps 



Qy 



37 HSLGVLHVSYSV— SNRVGPW 



WNI KSCQQKWDRQI LKDVSLYI ESGQIMC 



Db 


45 


NT L EVRD LN YQ VDLAS QV- P W F EQLAQ FKMP WT SPSCQNSCELGI-QNLS FKVRS GQMLA 


102 


Qy 


85 


ILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYVLQSDVFLSSLT 
|:|||| |: :|ll hll : | ::| I : 1 :|l 
1 1 GS S GCGRAS LLDVI TGRGHGGKI KSGQIWINGQPSS PQLVRKCVAHVRQHNQLLPNLT 


144 


Db 


103 


162 


Qy 


145 


VRETLRYTAMLALCRS-SADFYNKKVEAVMTELSLSHVADQMIGSYNFGGISSGERRRVS 

| | | | | : | : 1 1 : 1 : 1 : 1 1 1 : 1 1 1 II : 1 : 1 : 1 1 1 1 1 1 1 1 
VRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVS 


203 


Db 


163 


222 


Qy 


204 


IAAQLLQDPKWIMLDEPTTGLDCMTANQIVLLLAELARRDRI^ 

| | | | : | : : : I I I II : 1 1 1 1 1 : : 1 1 : 1 1 : : 1 : 1 : : : : 1 1 1 1 1 : : 1 : 1 1 
IGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDL 


263 


Db 


223 


282 


Ov 


264 


IAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRV 

: : : | | : : | : | : : I 1 1 1 II : 1 II 1 1 1 : 1 1 1 1 : 1 : 1 1 h 1 : 1 : : 
VLLLTTSGTPIYLGAAQHMVQYFTAIGYPCPRYSNPMFYVDLTSIDRRSREQELATREKA 


323 


Db 


283 


342 


Qy 


324 


QMLECAFKESDI YHKILENIERARYL KTLPM VPFKT 

Ml : 1 : 1 1 | : : 1 1 


359 


Db 


343 


QSLAALF LEKVRDLDDFLWKAETKDLDEDTCVESSVTPLDTNCLPSPT 


390 


Ov 


360 


KDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQD 

IN : |:| I II: ::: : :l : : 1 1 
K-MPGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGF — LYFGHGSIQLSFMD 


419 


Db 


391 


447 


Qy 


420 


RVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIA 


479 


Db 


448 


II: : I: :|: :: II: 1 =1111 1 :l M 
TAALLFMIGALIPFNVILDVISKCYSERAMLYYELEDGLYTTGPYFFAKILGELPEHCAY 


507 


Ov 


480 


T VI FS S VC YWT LGL Y P EVARF GYFSAALLAPHLIGEFLTLVLL 

: I : II 1 1 : 1 =1111 = 1=1 
IIIYGMPTYWIANLRPGLQPFLLHFLLWLWFCCRIMAIAAAALLPTFH^ 


522 


Db 


508 


566 


Qy 


523 


GIVQNPNIVNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEF- 
: : | | : 1 : : : : 1 : : 1 1 1 : : 1 


581 


Db 


567 


YNSFYLAGGFMINLSSLWTVPAWISKVSFLRWCFEGLMKIQFS 


609 


Qy 


coo 


vn MF-prrr^MT^MT NHPMCATTOGVOFIEKTCPGATSRFTANFLILY 
I : || I :| II = 11: 


62 9 


Db 


610 


648 


Qy 


630 


GFIPALVIL 638 




Db 


649 


1 ::| 
GLSGGFMVL 657 





RESULT 8 
YOH5_YEAST 

ID YOH5_YEAST STANDARD; PRT; 1294 AA. 

AC Q08234; Q08233; 

DT 01-NOV-1997 (Rel. 35, Created) 

DT 16-OCT-2001 (Rel. 40, Last sequence update) 

DT 16-OCT-2001 (Rel. 40, Last annotation update) 

DE Probable ATP-dependent transporter YOL074C/YOL075C . 

GN YOL074C/YOL075C. 



OS Saccharomyces cerevisiae (Baker's yeast). 

OC Eukaryota; Fungi; Ascomycota; Saccharomycotina; Saccharomycetes; 

OC Saccharomycetales; Saccharomycetaceae; Saccharomyces. 

OX NCBI_TaxID=4932; 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=97321807; PubMed=9178509; 

RA Tzermia M. , Katsoulou C, Alexandraki D.; 

RT "Sequence analysis of a 33.2 kb segment from the left arm of yeast 

RT chromosome XV reveals eight known genes and ten new open reading 

RT frames including homologues of ABC transporters, inositol 

RT phosphatases and human expressed sequence tags."; 

RL Yeast 13:583-589(1997). 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Potential). 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; Z74817; CAA99085.1; -. 

DR EMBL; Z74816; CAA99084.1; -. 

DR PIR; S77690; S77690. 

DR GermOnline; 143497; -. 

DR SGD; S0005435; YOL075C. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 2. 

DR ProDom; PD000006; ABC_transporter ; 2. 

DR SMART; SM00382; AAA; 2. 

DR PROSITE; PS00211; ABC__TRANSP0RTER_1 ; 2. 

DR PROSITE; PS50893; ABC_TRANSP0RTER_2 ; 2. 

KW Hypothetical protein; ATP-binding; Transmembrane; Glycoprotein; 



KW 


Transport; 


Repeat . 
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. ) (POTENTIAL) . 
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.) (POTENTIAL). 
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N-LINKED 


( GLCNAC . . 


. ) (POTENTIAL) . 


FT 


CARBOHYD 
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N-LINKED 


(GLCNAC. . 


. ) (POTENTIAL) . 


FT 


CARBOHYD 
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528 


N-LINKED 


(GLCNAC. . 


. ) (POTENTIAL) . 


FT 


CARBOHYD 
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N-LINKED 


(GLCNAC. . 


. ) (POTENTIAL) . 


FT 
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N-LINKED 


(GLCNAC. . 


.) (POTENTIAL). 



SQ SEQUENCE 1294 AA; 145157 MW; C555500A45E9284E CRC64; 



Query Match 18.6%; Score 628; DB 1; Length 1294; 

Best Local Similarity 29.6%; Pred. No. 2.8e-37; 

Matches 183; Conservative 123; Mismatches 237; Indels 76; Gaps 21; 

Qy 67 RQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRT GTLEGEVFVNGCELR 122 

: : | | : I : : I I I : I I I I I I : : I I : I I II I : : I : I - 

Db 707 KEILQSVNAI FKPGMINAIMGPSGSGKSSLLNLISGRLKSSVFAKFDTSGSIMFNDIQVS 7 66 

Qy 123 RDQFQDCFSYVLQ-SDVFLSSLTVRETLRYTAMLALCRSSADFYNKKVEAVMTELSLSHV 181 

|:: III I I I : : I I I : I I I : I I I I : :: : :: III 

D b 767 ELMFKNVCSWSQDDDHLLAALTVKETLKYAAALRLHHLTEAERMERTDNLIRSLGLKHC 826 

Qy 182 ADQMIGSYNFGGISSGERRRVSIAAQLLQDPKWIMLDEPTTGLDCMTANQIVLLLAELAR 241 

: : | | : I I I I I : I I I : : I I I I I : : : I I I I I : I I I I : I : : I : I I 

Db 827 ENNIIGNEFVKGISGGEKRRVTMGVQLLNDPPILLLDEPTSGLDSFTSATILEILEKLCR 886 

Qy 242 -RDRIVI VTIHQPRSELFQHFDKIAILT-YGELVFCGTPEEMLGFFNNCGYPCPEHSNPF 299 

: : : I : I I I I I I I I I I : I : : I I I I : I : I I : : I I I I I : I 
Db 887 EQGKTIIITIHQPRSELFKRFGNVLLLAKSGRTAFNGSPDEMIAYFTELGYNCPSFTNVA 946 

Q y 300 DFYMDLTSVDTQSREREIETYKRVQMLECAFKESDI YHKILEN IERARYLKT 351 

I I :: I I I I : I I : : I I : II: : hi : ::l I: :l : 

Db 947 DFFLDLI SVNTQNEQNEI S SRARVEKILSAWKAN MDNESLSPTPISEKQQYSQE 1000 

Qy 352 LPMVPFK — TKDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQ 409 

: : | : I : : I |: ::: |: I :|: : |: 

Db 1001 S F FT E Y S E FVRK PAN L VLAY I VNVKRQ FT T T RRS FD S LMARI AQ I P GL GVI FAL FFAP VK 1060 

Qy 410 NNT L KGAVQ DRVGLL YQ LVGAT P YT GMLNAVN L F PML RAVS DQ E S Q D GL YH KWQML LAYV 469 

:| :: :|:|| : I : III : :| I : I 1=1 III: 

Db 1061 HNYT — SISNRLGLAQEST-ALYFVGMLGNLACYPTERDYFYEEYNDNVYGIAPFFLAYM 1117 

Qy 470 LHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLI GEFLTLVLLGIVQ 526 

I I I : I : I ::: III I III: :: III:: : 

Db 1118 TLELPLSALASVLYAVFTVLACGL-PRTA— GNFFATVYCSFIVTCCGERLGIMTNTFFE 1174 

Qy 527 NPN-IVNSIVALLSI SGLL-IGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNE 580 

I : I I I : I I I III: :l I II I I I" 

Db 1175 RPGFWNCISIILSIGTQMSGLMSLG MS RVLKG FN YLN P VG YT SMI I IN FA 1225 

Qy 581 FYG-LNFTC--GGSNTSMLNHPMCAITQGVQFIEKTCPGATSRFTANFLILYGFIP 633 

I I I II llh : I I I : | : | I : 

Db 1226 FPGNLKLTCEDGGKNS DGTCEFANGH DVLVSYGLVRNTQK 1265 

Qy 634 — ALVILGIVI FKVRDYLI 650 

: : : : | : : : : I 
Db 1266 YLGIIVCVAIIYRLIAFFI 1284 



RESULT 9 
ADP1_YEAST 

ID ADP1_YEAST STANDARD; PRT; 1049 AA. 

AC P25371; 

DT 01-MAY-1992 (Rel. 22, Created) 

DT 01-MAY-1992 (Rel. 22, Last sequence update) 



DT 16-OCT-2001 (Rel. 40, Last annotation update) 

DE Probable ATP-dependent permease precursor. 

GN ADPl OR YCR011C OR YCR11C OR YCR105. 

OS Saccharomyces cerevisiae (Baker 1 s yeast). 

OC Eukaryota; Fungi; Ascomycota; Saccharomycotina; Saccharomycetes ; 

OC Saccharomycetales ; Saccharomycetaceae ; Saccharomyces . 

OX NCBI_TaxID=4932; 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=92160395; PubMed=17 89009; 

RA Purnelle B. , Skala J., Goffeau A.; 

RT "The product of the YCR105 gene located on the chromosome III from 

RT Saccharomyces cerevisiae presents homologies to ATP-dependent 

RT permeases."; 

RL Yeast 7:867-872(1991) . 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=92327849; PubMed=1626432 ; 

RA Skala J., Purnelle B., Goffeau A.; 

RT "The complete sequence of a 10.8 kb segment distal of SUF2 on the 

RT right arm of chromosome III from Saccharomyces cerevisiae reveals 

RT seven open reading frames including the RVS161, ADPl and PGK genes."; 

RL Yeast 8:409-417(1992). 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Potential). 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; X59720; CAA42328.1; 

DR PIR; S19421; S19421. 

DR GermOnline; 138916; -. 

DR SGD; S0000604; ADPl. 

DR GO; GO: 0005783; C : endoplasmic reticulum; IDA. 

DR InterPro; IPR003593; AAA__ATPase . 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC__transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

KW ATP-binding; Transmembrane; Glycoprotein; Transport; Signal. 
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SEQUENCE 


1049 


AA; 117231 


MW; ABC9CE54BCFDF6A3 


CRC64; 


Query Match 




17.9%; 


Score 602 


.5; DB 1; 


Length 104 9; 



Best Local Similarity 26.5%; Pred. No. 1.5e-35; 
Matches 191; Conservative 130; Mismatches 227; Indels 173; Gaps 25; 
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38 SLGVLHVSYSVSNRVGPWWNIKSCQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLL 97 
: | : : : | | | I | : : : I : : I : : I I I : I : I II : I I I I I I 

383 TLSFENITYSV PSINSDGVEE TVLNEISGIVKPGQILAIMGGSGAGKTTLL 433 

98 DAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLAL 157 

| :: : | : I I : I : I I I : I I : I I I I : I I I I I : : I : I I 

434 DILAMK-RKTGHVSGSIKVNGISMDRKSFSKIIGFVDQDDFLLPTLTVFETVLNSALLRL 492 

158 CRS- SADFYNKKVEAVMTELSLSHVADQMI GS YNFGGI S SGERRRVS I AAQLLQDPKVMM 216 

: : | : :| |: M : : |::||: I I I I I : I I I I I I : I : I I : 

493 PKALSFEAKKARVTKVLEELRIIDIKDRIIGNEFDRGISGGEKRRVSIACELVTSPLVXF 552 

217 LDEPTTGLDCMTANQIVLLLAELAR-RDRIVIVTIHQPRSELFQHFDKIAILTYGELVFC 275 

11111:111 || :: I |: :| ::::IMIII :| III: :|: Ihh 
553 LDEPTSGLDASNANNVIECLVRLSSDYNRTLVLSIHQPRSNIFYLFDKLVLLSKGEMVYS 612 

276 GTPEEMLGFFNNCGYPCPEHSNPFDFYMDLT SVD- 309 

I ::: I I I I I I:: I I: :hl : l 

613 GNAKKVSEFLRNEGYICPDNYNIADYLIDITFEAGPQGKRRRIRNISDLEAGTDTNDIDN 672 

310 TQSREREIETYKRVQMLECA 329 

III ::l 

673 TIHQTTFTSSDGTTQREWAHLAAHRDEIRSLLRDEEDVEGTDGRRGATEIDLNTKLLHDK 732 

330 FKESDIYHKILENI ERARYLK-TLPMVPFKTKDPPGMFGKLGVLLRRVTRNL 380 

: | : I I :: : I I : I II : I : I : I I : I : 

733 YKDSVYYAELSQEIEEVLSEGDEESNVLNGDLP TGQQSAGFLQQLSILNSRSFKNM 788 

381 MRN KQAVI MRLVQN L I MG L FL — I FY L LRVQNNT L KGAVQDRVGL L YQ LV GATPYTG 435 

|| : :: : ::: III ::| : :| : I |:|:|| : :: I :|| 
789 YRNPKLLLGNYLLTI LLSLFLGTLYYNV SNDI SG- FQNRMGLFFFI LTYFGFVTFTG 844 

436 MLN AVN L F PML RAVS DQESQDGLYH KWQML LAYVL HVLPFSVIATVIFSSVCYWT 4 90 

: : | : | : : I : I III: I : I I : :: I : I 

845 L S S FALERI I FI KERSNN YYS P LAYYI SKIMS EWPLRWP PILLS LI VYPM 896 

491 LGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIV QNPNIVNSIVALLS ISGLL 544 

|| : | : : I : I I : : : I I : I : I : I : : I I I I I 

897 TGLNMKDNAF- FKCI GI LI LFNLGI SLEI LTI GI I FEDLNNS I ILSVLVLLGSLLFSGLF 955 



545 IGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEF- 



-YGLNFTCGGSNTSM 595 



|:|| : I I I : I I I :: I I I I I I 
Db 956 INTKNITN VAFKYLKNFSVFYYAYESLLINEVKTLMLKERKYGLNI 1001 

Qy 596 LNH PMCAI TQGVQ FI EKT C P GAT S RFT AN FLI L YGFI PALVI LGI VIFKVRDY 648 

I I I I I : I I : : : I I : I : I 

Db 1002 EVP GAT 1 LST FGFWQNLVFDI KI LALFNWFLIMGY 1038 

Qy 649 L 649 

I 

Db 1039 L 1039 



RESULT 10 
WHIT_DROME 

ID WHIT_DROME STANDARD; PRT; 687 AA. 

AC P10090; Q9V3A2; Q9XY33; 

DT 01-MAR-1989 (Rel. 10, Created) 

DT 01-NOV-1991 (Rel. 20, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE White protein. 

GN W OR EG:BACN33B1.1 OR CG2759. 

OS Drosophila melanogaster (Fruit fly) . 

OC Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; 

OC Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; 

OC Ephydroidea; Drosophilidae; Drosophila. 

OX NCBI_TaxID=7227 ; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Head; 

RX MEDLINE-90221897; PubMed-2109311 ; 

RA Pepling M. , Mount S.M.; 

RT "Sequence of a cDNA from the Drosophila melanogaster white gene."; 

RL Nucleic Acids Res. 18:1633-1633(1990). 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=85134865; PubMed=60847 17 ; 

RA O'Hare K. , Murphy C, Levis R. , Rubin G.M. ; 

RT "DNA sequence of the white locus of Drosophila melanogaster."; 

RL J. Mol. Biol. 180:437-455(1984). 
RN , [3] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=21100348; PubMed-11156992 ; 

RA Lukacsovich T., Asztalos Z., Awano W., Baba K., Kondo S., Niwa S., 

RA Yamamoto D.; 

RT "Dual-tagging gene trap of novel genes in Drosophila melanogaster."; 

RL Genetics 157:727-742(2001). 

RN [4] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Berkeley; 

RX MEDLINE=20196006; PubMed=10731132 ; 

RA Adams M.D., Celniker S.E., Holt R.A. , Evans C.A., Gocayne J.D., 

RA Amanatides P.G., Scherer S.E., Li P.W., Hoskins R.A., Galle R.F., 

RA George R.A. , Lewis S.E., Richards S., Ashburner M. , Henderson S.N., 

RA Sutton G.G., Wortman J.R., Yandell M.D., Zhang Q. , Chen L.X., 

RA Brandon R.C., Rogers Y.-H.C, Blazej R.G., Champe M. , Pfeiffer B.D., 

RA Wan K.H., Doyle C, Baxter E.G., Helt G., Nelson C.R., Miklos G.L.G., 

RA Abril J.F., Agbayani A. , An H.-J., Andrews -Pfannkoch C, Baldwin D., 



RA Ballew R.M., Basu A., Baxendale J., Bayraktaroglu L., Beasley E.M., 

RA Beeson K.Y., Benos P.V., Berman B.P., Bhandari D., Bolshakov S., 

RA Borkova D * , Botchan M.R., Bouck J., Brokstein P., Brottier P., 

RA Burtis K.C., Busam D.A., Butler H., Cadieu E., Center A., Chandra I., 

RA Cherry J.M., Cawley S., Dahlke C, Davenport L.B. f Davies P., 

RA de Pablos B., Delcher A., Deng Z., Mays A.D., Dew I., Dietz S.M., 

RA Dodson K., Doup L.E., Downes M. , Dugan-Rocha S., Dunkov B.C., Dunn P., 

RA Durbin K.J., Evangelista C.C., Ferraz C, Ferriera S., Fleischmann W., 

RA Fosler C. , Gabrielian A.E., Garg N.S., Gelbart W.M., Glasser K . , 

RA Glodek A., Gong F. , Gorrell J.H., Gu Z., Guan P., Harris M- , 

RA Harris N.L., Harvey D.A., Heiman T.J., Hernandez J.R., Houck J., 

RA Hostin D., Houston K.A., Howland T.J., Wei M.-H., Ibegwam C, 

RA Jalali M. , Kalush F. , Karpen G.H., Ke Z,, Kennison J. A., Ketchum K.A. , 

RA Kimmel B.E., Kodira CD., Kraft C, Kravitz S. f Kulp D., Lai Z., 

RA Lasko P . , Lei Y., Levitsky A. A., Li J.H., Li Z . , Liang Y., Lin X., 

RA Liu X., Mattei B., Mcintosh T.C., McLeod M.P., McPherson D., 

RA Merkulov G., Milshina N.V., Mobarry C, Morris J., Moshrefi A. , 

RA Mount S.M., Moy M. , Murphy B., Murphy L., Muzny D.M., Nelson D.L., 

RA Nelson D.R., Nelson K.A., Nixon K., Nusskern D.R., Pacleb J.M., 

RA Palazzolo M. , Pittman G.S., Pan S., Pollard J., Puri V. , Reese M.G., 

RA Reinert K. , Remington K., Saunders R.D.C., Scheeler F. , Shen 

RA Shue B.C., Siden-Kiamos I. f Simpson M., Skupski M.P., Smith T., 

RA Spier E. , Spradling A.C., Stapleton M., Strong R. , Sun E., 

RA Svirskas R., Tector C, Turner R., Venter E . , Wang A.H., Wang X., 

RA Wang Z.-Y., Wassarman D.A., Weinstock G.M., Weissenbach J., 

RA Williams S.M., Woodage Worley K.C., Wu D., Yang S., Yao Q.A., 

RA Ye J., Yeh R.-F., Zaveri J.S., Zhan M. , Zhang G., Zhao Q. , Zheng L., 

RA Zheng X.H., Zhong F.N., Zhong W., Zhou X., Zhu S., Zhu X., Smith H.O., 

RA Gibbs R.A. , Myers E.W. f Rubin G.M., Venter J.C.; 

RT "The genome sequence of Drosophila melanogaster . " ; 

RL Science 287:2185-2195(2000). 

RN [5] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Oregon-R; 

RX MEDLINE=2 01960 11; PubMed=10731137 ; 

RA Benos P.V. , Gatt M.K., Ashburner M. , Murphy L., Harris D., 

RA Barrell B.G., Ferraz C, Vidal S. , Brun C, Demailles J., Cadieu E. r 

RA Dreano S., Gloux S., Lelaure V., Mottier S., Galibert F. f Borkova D., 

RA Minana B., Kafatos F.C., Louis C, Siden-Kiamos I., Bolshakov S., 

RA Papagiannakis G. f Spanos L. f Cox S., Madueno E., de Pablos B., 

RA Modolell J., Peter A., Schoettler P., Werner M. , Mourkioti F., 

RA Beinert N., Dowe G. , Schaefer U., Jaeckle H., Bucheton A., 

RA Callister D.M. , Campbell L.A., Darlamitsou A., Henderson N.S., 

RA McMillan P.J., Salles C, Tait E.A. , Valenti P., Saunders R.D.C., 

RA Glover D.M. ; 

RT "From sequence to chromosome: the tip of the X chromosome of D. 

RT melanogaster . " ; 

RL Science 287:2220-2222(2000). 

RN [6] 

RP SEQUENCE OF 224-331 FROM N.A. 

RX MEDLINE-89339145; PubMed=2503416; 

RA Tearle R. G. , Belote J.M. , McKeown M., Baker B.S., Howells A. J . ; 
RT "Cloning and characterization of the scarlet gene of Drosophila 

RT melanogaster . " ; 

RL Genetics 122:595-606(1989). 

CC -!- FUNCTION: Part of a membrane- spanning permease system ■ necessary 
CC for the transport of pigment precursors into pigment cells 
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responsible for eye color. White dimerize with brown for the 
transport of guanine and with scarlet for the transport of 
tryptophan. 

SUBUNIT: Heterodimer of white with either brown or scarlet. 
SUBCELLULAR LOCATION: Integral membrane protein. 

SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 
the European Bioinf ormatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib . ch) . 

EMBL; X51749; CAA36038.1; -. 
EMBL; X02974; CAA26716.1; -. 
EMBL; AB028139; BAA78210.1; -. 
EMBL; AE003425; AAF45826.1; -. 
EMBL; AL133506; CAB65847.1; 
EMBL; X76202; CAA53795.1; -. 
PIR; S08635; FYFFW. 
FlyBase; FBgn0003996; w. 

GO; GO:0004888; F: transmembrane receptor activity; NAS . 



DR 


GO; GO: 0006727 ; P 


: oitimochrome biosynthesis; IMP. 
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InterPro; 


IPR003593; AAA 


ATPase. 


DR 


InterPro; 


IPR003439; ABC_ 


transporter. 


DR 


InterPro; 


IPR005284; Pigment_ 


permease . 


DR 


Pfam; PF00005; ABC_tran; 


1. 




DR 


ProDom; PD000006; 


ABC transporter; 1. 


DR 


SMART; SM00382; AAA; 1. 






DR 


TIGRFAMs; 


TIGR00955; 3a01204; 


1. 


DR 


PROSITE; 


PS00211; 


ABC TRANS PORTER_l; 1. 


DR 


PROSITE; 


PS50893; 


ABC TRANSPORTER^; 1. 


KW 


Pigment; 


ATP-binding ; Transmembrane ; Transport . 
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NP BIND 


130 


137 




ATP (BY SIMILARITY) . 


FT 


TRANSMEM 


435 


453 




POTENTIAL . 
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TRANSMEM 
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POTENTIAL. 
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POTENTIAL. 
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TRANSMEM 
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563 




POTENTIAL . 


FT 


TRANSMEM 


576 


594 




POTENTIAL. 


FT 


TRANSMEM 


659 


678 




POTENTIAL. 


FT 


CONFLICT 


25 


29 




GDSGA -> LIFEIPYHCRVTAD (IN REF. 2 AND 


FT 










3) . 


FT 


CONFLICT 


49 


49 




L -> R (IN REF. 4 AND 5) . 


FT 


CONFLICT 


335 


371 




VGAQC PTN YNP ADFYVQVLAWP GRE I E S RDRI AKI C 



FT 
FT 
SQ 



SEQUENCE 687 AA; 75672 MW; 



ITLHLNSYPAWVPSVLPTTIRRTFTYRCWPLCPDGRSSPVI 
GSPRYG (IN REF. 3) . 

24AFAD7 99DE0D396 CRC64; 



Query Match 17.5%; Score 589; DB 1; Length 687; 

Best Local Similarity 27.5%; Pred. No. 8e-35; 

Matches 200; Conservative 129; Mismatches 255; Indels 142; Gaps 



25; 



Qy 

Db 



11 GARGP— HINRGSLSSLEQ GSVTG TEAR 36 

I:: I 1:1 I : I II : I I I 

13 GSKHPSAEHLNNGDSGAASQSCINQGFGQAKNYGTLLPPSPPEDSGSGSGQLAENLTYAW 72 



Qy 37 HSLGVLHVSYSVSNRVGPWWNI KSCQQKW DRQILKDVSLYIESGQIMCI 85 

|:: : : I : I I I :: : :||:| M" : 

Db 73 HNMDI FGAVNQ P G S GWRQ LVN RT RGL FCNE RH I PAP RKHL L KNVC GVAYP GE LLAV 12 8 

Qy 86 LGSSGSGKTTLLDAISGRLRR— TGTLEGEVFVNGCELRRDQFQDCFSYVLQSDVFLSSL 143 

: | M | : | | I I I I : I : : I : : I : I I : : I : I I I I : I : I I 
Db 129 MGSSGAGKTTLLNALAFRSPQGIQVSPSGMRLLNGQPVDAKEMQARCAYVQQDDLFIGSL 188 

Qy 144 TVRETLRYTAMLALCRSSADFYNK KVEAVMTELSLSHVADQMIG-SYNFGGISSGER 199 

I I I I : I I : : I I : : I : I : I I I I I : I I I : I I I I 

Db 189 TAREHLIFQAMVRMPRHLT — YRQRVARVDQVIQELSLSKCQHTIIGVPGRVKGLSGGER 246 

Qy 2 00 RRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELARRDR 259 

: I : : I : : I II : : : II I I M I I II: :l :l :|::: : I I - 1 I I I I 1111 = 
Db 247 KRLAFASEALTDPPLLICDEPTSGLDSFTAHSWQVLKKLSQKGKTVILTIHQPSSELFE 306 

Qy 260 HFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIET 319 

MM:: I : I I I I I : I I : I 11 = II 111= - =1 MM: 
Db 307 L FD KI L LMAEGRVAFL GT P S EAVD FF S YVGAQC PTN YN PAD F YVQVLAV VPGREIES 363 

Qy 320 YKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTKDPP GMFGKLGV 371 

I : : II M : : M II I : I I : 

Db 364 RDRIAKICDNFAIS KVARDMEQLLATKNLE KPLEQPENGYTYKATWFMQFRA 415 

Qy 372 LLRRVT RN LMRN KQAVI MRLVQN L I MGL FL I F YL L RVQNNT LKGAVQ D RVGL L YQLVGAT 431 

Ml : : : : I : M M : : : I I : I I I I : I : : : 
Db 416 VLWRSWLSVLKEPLLVKVRLIQTTMVAI-LIGLIFLGQQLTQVG-VMNINGAIFLFLTNM 473 

Qy 432 PYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIATVIFSSVCYWTL 491 

: : :|:| | : | : : I I I : I I : : : I : : : I : 

Db 474 TFQNVFATINVFTSELPVFMREARSRLYRCDTYFLGKTIAELPLFLTVPLVFTAIAYPMI 533 

Qy 4 92 GLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLS 539 

M II I I I : : I I : I hi 

Db 534 GLRAGVLHF — FNCLALVTLV — ANVSTSFGYLISCASSSTSMALSV 576 

Qy 540 ISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGL NFTCGGSN 592 

| | : || I M : I I I I i : : : I I I : M : : : M I I 

Db 577 GPPVI I PFLLFGGFFLNSGSVPVYLKWLS YLSWFRYANEGLLINQWADVEPGEI SCTS SN 636 

Qy 593 T SMLNH PMCAI TQGVQ FI EKT C P GA TSRFTANFLI LYGFI PAL VI LGI VI FKVR 646 

| III : I IM I I :: I II II IM 

Db 637 T TCPSSGKVILETLNFSAADLPL-DYV-GLAIL-IVS FRVL 674 



Qy 647 DYLISR 652 

I I I 

Db 675 AYLALR 680 



RESULT 11 
ABG1__HUMAN 

ID ABG INHUMAN STANDARD; PRT; 678 AA. 

AC P45844; Q9BXK6; Q9BXK7 ; Q9BXK8; Q9BXK9; Q9BXL0; Q9BXL1; Q9BXL2; 

AC Q9BXL3; Q9BXL4; 

DT 01-NOV-1995 (Rel. 32, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 



DT 15-MAR-2004 (Rel. 43, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 1 (White protein homolog) 

DE (ATP-binding cassette transporter 8) . 

GN ABCG1 OR ABC8 OR WHT1. 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleo'stomi; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI__TaxID=9606; 

RN [1] 

RP SEQUENCE OF 3-678 FROM N.A. (ISOFORMS 1 AND 4) . 

RC TISSUE=Retina; 

RX MEDLINE=96256850; PubMed=8659545 ; 

RA Chen H.M., Rossier C, Lalioti M.D., Lynn A., Chakravarti A., 

RA Perrin G., Antonarakis S.E.; 

RT "Cloning of the cDNA for a human homologue of the Drosophila white 

RT gene and mapping to chromosome 21q22.3." ; 

RL Am. J. Hum. Genet. 59:66-75(1996). 

RN [2] 

RP SEQUENCE FROM N.A. (ISOFORM 1) . 

RX MEDLINE=20289799; PubMed=10830953 ; 

RA Hattori M. , Fujiyama A., Taylor T.D., Watanabe H., Yada T., 

RA Park H.-S., Toyoda A., Ishii K., Totoki Y., Choi D.-K., Groner Y. , 

RA Soeda E., Ohki M. , Takagi T . , Sakaki Y., Taudien S . , Blechschmidt K. , 

RA Polley A., Menzel U., Delabar J., Kumpf K., Lehmann R. , Patterson D-, 

RA Reichwald K. , Rump A., Schillhabel M., Schudy A., Zimmermann W., 

RA Rosenthal A. , Kudoh J. , Shibuya K., Kawasaki K. , Asakawa S., 

RA Shintani A. f Sasaki T., Nagamine K. , Mitsuyama S., Antonarakis S.E., 

RA Minoshima S., Shimizu N., Nordsiek G., Hornischer K. , Brandt P., 

RA Scharfe M. , Schoen 0., Desario A. , Reichelt J., Kauer G., Bloecker H., 

RA Ramser J. r Beck A., Klages S., Hennig S., Riesselmann L., Dagand E. f 

RA Wehrmeyer S., Borzym K. f Gardiner K., Nizetic D . , Francis F. , 

RA Lehrach H., Reinhardt R. r Yaspo M.-L.; 

RT "The DNA sequence of human chromosome 21."; 

RL Nature 405:311-319(2000). 

RN [3] 

RP SEQUENCE FROM N.A. (ISOFORM 1) . 

RX MEDLINE-20408883; PubMed=l 095 0923 ; 

RA Berry A., Scott H.S., Kudoh J., Talior I., Korostishevsky M. , 

RA Wattenhofer M. , Guipponi M. , Barras C, Rossier C, Shibuya K., 

RA Wang J., Kawasaki K., Asakawa S., Minoshima S. f Shimizu N., 

RA Antonarakis S.E., Bonne-Tamir B.; 

RT "Refined localization of autosomal recessive nonsyndromic deafness . 

RT DFNB10 locus using 34 novel microsatellite markers, genomic 

RT structure, and exclusion of six known genes in the region."; 

RL Genomics 68:22-29(2000). 

RN [4] 

RP SEQUENCE FROM N.A. (ISOFORM 1) . 

RX MEDLINE=21192304; PubMed=11279031 ; 

RA Porsch-Oezcueruemez M. , Langmann T., Heimerl S., Borsukova H., 

RA Kaminski W.E., Drobnik W., Honer C, Schumacher C. f Schmitz G. ; 

RT "The zinc finger protein 202 (ZNF202) is a transcriptional repressor 

RT of ATP binding cassette transporter Al (ABCA1) and ABCG1 gene 

RT expression and a modulator of cellular lipid efflux."; 

RL J. Biol. Chem. 276:12427-12433(2001). 

RN [5] 

RP SEQUENCE FROM N.A. (ISOFORMS 2; 3; 4; 5; 6 AND 7). 

RX MEDLINE=21092576; PubMed=11162488 ; 



RA Lorkowski S., Rust S., Engel T., Jung E., Tegelkamp K. , Galinski E.A. , 

RA Assmann G. , Cullen P.; 

RT "Genomic sequence and structure of the human ABCG1 (ABC8) gene."; 

RL Biochem. Biophys . Res. Commun. 280:121-131(2001). 

RN [6] 

RP SEQUENCE OF 33-678 FROM N.A. 

RC TISSUE=Fetal brain; 

RX MEDLINE-97186700; PubMed-9034316; 

RA Croop J.M., Tiller G.E., Fletcher J. A., Lux M.L., Raab E., 

RA Goldenson D., Arciniegas S., Son D., Wu R. ; 

RT "Isolation and characterization of a mammalian homolog of the 

RT Drosophila white gene."; 

RL Gene 185:77-85(1997). 

RN [7] 

RP INDUCTION, AND PROBABLE FUNCTION. 

RX MEDLINE=20261604; PubMed=107 99558 ; 

RA Venkateswaran A. , Repa J.J., Lobaccaro J.-M.A. , Bronson A., 

RA Mangelsdorf D.J., Edwards P. A.; 

RT "Human white/murine ABC8 mRNA levels are highly induced in 

RT lipid-loaded macrophages. A transcriptional role for specific 

RT oxysterols . " ; 

RL J. Biol. Chem. 275:14700-14707(2000). 

RN [8] 

RP INDUCTION, AND PROBABLE FUNCTION. 

RX MEDLINE=20105556; PubMed=10 639163 ; 

RA Klucken J., Buechler C, Orso E., Kaminski W.E., 

RA Porsch-Oezcueruemez M. , Liebisch G. , Kapinsky M. , Diederich W., 

RA Drobnik W., Dean M. , Allikmets R. , Schmitz G.; 

RT "ABCGl (ABC8 ) , the human homolog of the Drosophila white gene, is a 

RT regulator of macrophage cholesterol and phospholipid transport."; 

RL Proc. Natl. Acad. Sci. U.S.A. 97:817-822(2000). 

RN [9] 

RP REVIEW. 

RX MEDLINE=21474438; PubMed=11590207 ; 

RA Schmitz G., Langmann T., Heimerl S.; 

RT "Role of ABCGl and other ABCG family members in lipid metabolism."; 

RL J. Lipid Res. 42:1513-1520(2001). 

CC -!- FUNCTION: Transporter involved in macrophage lipid homeostasis. Is 
CC an active component of the macrophage lipid export complex. Could 

CC also be involved in intracellular lipid transport processes. The 

CC role in cellular lipid homeostasis may not be limited to 

CC macrophages . 

CC -!- SUBUNIT: May form heterodimers with several heterologous partners 
CC of the ABCG subfamily. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein. Predominantly 
CC localized in the intracellular compartments mainly associated with 

CC the endoplasmic reticulum (ER) and Golgi membranes. 

CC -!- ALTERNATIVE PRODUCTS: 

CC Event=Alternative splicing; Named isoforms=7; 

CC Comment=Additional isoforms seem to exist; 

CC Name=l; 

CC IsoId=P45844-l; Sequence=Displayed; 

CC Name =2 ; Synonyms=J; 

CC IsoId=P45844-2; Sequence=VSP_000047 , VSP_000051; 

CC Name=3; Synonyms=ABDE; 

CC IsoId=P45844-3; Sequence=VSP__000048 , VSP_000051; 

CC Name=4; Synonyms=G; 



CC IsoId=P45844-4; Sequence=VSP_000051 ; 

CC Name =5 ; Synonyms=F; 

CC IsoId=P45844-5; Sequencers P_0 00 04 9, VSP_000051; 

CC Name=6 ; Synonyms=HI ; 

CC IsoId=P45844-6; Sequence=VSP_000046, VSP__000051; 

CC Name =7 ; Synonyms=C; 

CC IsoId=P45844-7; Sequence=VSP_000050, VSP__000051; 

CC -!- TISSUE SPECIFICITY: EXPRESSED IN SEVERAL TISSUES. 

CC -!- INDUCTION: Strongly induced in monocyte-derived macrophages during 
CC cholesterol influx. Conversely, mRNA and protein expression are 

CC suppressed by lipid efflux. Induction is mediated by the liver X 

CC receptor/retinoide X receptor (LXR/RXR) pathway. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 
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Query Match 17.4%; Score 586; DB 1; Length 678; 

Best Local Similarity 28.3%; Pred. No. 1.3e-34; 

Matches 165; Conservative 125; Mismatches 240; Indels 54; Gaps 15; 

Qy 23 SSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKSCQQKWDRQILKDVSLYIESGQI 82 

Ml:: || : I I I I I I I I : : I : : I I : I I I : : 

Db 67 SSLPRRAAVNIEFR DLSYSVPE--GPWW RKKGYKTLLKGISGKFNSGEL 113 

Qy 83 MCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQ — FQDCFSYVLQSDVFL 140 

: I : I I I : I I : I I : : : : I I I I : : I I : M III I : I : : I I : I 
Db 114 VAI MG P S GAG KS TLMN I LAG- YRET G-MKGAVL I N G — LPRDLRCFRKVSCYIMQDDMLL 169 

Qy 141 SSLTV^ETLRYTAMLALCRSSADFYNKKVEAVMTELSLSHVADQMIGSYNFGGISSGERR 200 

| | | : | : : I I I : : : I : : : I I I I : II : I I : I : 

Db 170 PHLTVQEAMMVSAHLKL-QEKDEGRREMVKEILTALGLLSCANTRTGS LSGGQRK 223 

Qy 2 01 RVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELARRDRIVIVTIHQPRSELFQ 2 60 

I : : I I : I : : I I I I I I I : I I I : I : I I : I I : I : I M I I I : : I I : 
Db 224 RLAIALELVNNPPVMFFDEPTSGLDSASCFQWSLMKGLAQGGRSIICTIHQPSAKLFEL 283 

Qy 261 FDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETY 320 

| | :: : I : I : I : I : : : : I I I : I I I I I :: I : : : 
Db 284 FDQLYVLSQGQCVTRGKVCNLVPYLRDLGLNCPTYHNPADFVMEVASGEYGDQNSRLVRA 343 

Qy 321 KRVQMLECAFKES DI YHKI LENI ERARYLKTLPMVPFKTKDP PGMFG 367 

II: I : : | : | : : : : | | | II II 

Db 344 VREGMCDS DHKRDLGGDAEVNP FLWHRP S EEVKQTKRLKGL RKDSSSMEGCHSF 397 



Qy 368 KL GVL L RRVT RN LMRN KQAVI MRLVQN L I MGL FL I F YL LRVQNNT L KGAVQDRV 421 

: :| :| ::||: :|: :: :|| : I : I I I 

Db 398 SASCLTQFCILFKRTFLSIMRDSVLTHLRITSHIGIGLLIGLLYLGIGNEAKK— VLSNS 455 



Qy 


422 


GLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIATV 

||:: : :: | | |: | :| : 1 1 1 : :| 1 :: 1 

GFLFFSMLFLMFAALMPTVLTFPLEMGVFLREHLNYWYSLKAYYLAKTMADVPFQIMFPV 


481 


Db 


456 


515 


Qy 


482 


IFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSIS 

-1*11 r || 1:1 | : : 1 1 1 : 1 | : | : : 
AYC S I VYWMT S Q P S DAVRFVLFAALGTMT S LVAQS LGL- L I GAAS T S LQVAT FVGPVTAI 


541 


Db 


516 


574 


Qy 


542 


GLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLN 585 

:|: Ml : :| |: : 1 :: :l 1 : = - III: 
PVLLFSGFFVSFDTIPTYLQWMSYISYVRYGFEGVILS-IYGLD 617 




Db 


575 





RESULT 12 
YPC3_CAEEL 

ID YPC3_CAEEL STANDARD; PRT; 598 AA. 

AC Q11180; 

DT 01-NOV-1997 (Rel. 35, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 

DE Putative ABC transporter C05D10.3 in chromosome III. 

GN C05D10.3. 

OS Caenorhabditis elegans. 

OC Eukaryota; Metazoa; Nematoda; Chromadorea; Rhabditida; Rhabditoidea; 

OC Rhabditidae; Peloderinae; Caenorhabditis. 

OX NCBI_TaxID=6239; 

RN [1] o 

RP SEQUENCE FROM N.A. 

RC STRAIN=Bristol N2; 

RA Du Z . ; 

RL Submitted (AUG-1994) to the EMBL/ GenBank/DDB J databases. 

RN [2] 

RP REVISIONS. 

RA Waterston R. ; 

RL Submitted (SEP-2001) to the EMBL/ GenBank/DDB J databases. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Potential). 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

CC 7 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement ( See http : //www. isb-sib . ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; U13645; AAA20989.2; -. 

DR WormPep; C05D10.3; CE29170. 

DR InterPro; IPR003593; AAA_ATPase . 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR005284; Pigment_permease . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR TIGRFAMs; TIGR00955; 3a01204; 1. 

DR PROSITE; PS00211; ABC_TRANSP0RTER_1 ; FALSE_NEG. 
DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 



KW Hypothetical protein; ATP-binding; Transmembrane; Transport. 

FT NP_BIND 27 34 ATP (POTENTIAL) . 

FT TRANSMEM 336 356 POTENTIAL. 

FT TRANSMEM 425 445 POTENTIAL. 

FT TRANSMEM 453 47 3 POTENTIAL. 

FT TRANSMEM 478 498 POTENTIAL. 

SQ SEQUENCE 598 AA; 66906 MW; 9D6414E06898E343 CRC64; 

Query Match 17.3%; Score 584.5; DB 1; Length 598; 

Best Local Similarity 25.4%; Pred. No. 1.4e-34; 

Matches 154; Conservative 137; Mismatches 273; Indels 43; Gaps 12; 

Qy 67 RQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQF 12 6 

: : I I : I I I I I : : : I I I I I I : I I I I I = = : : I : : I : : : I : : 

Db 7 KEI LHNVS GMAES GKLLAI LGS S GAGKTTLMNVLTS RNLTNLDVQGS I LI DGRRANKWKI 66 

Qy 127 QDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRSSADFYNK KVEAVMTELSLSHV 181 

:: ::| I |:|: ::| II I:: I I I :l= =11 l:|:: I 

Db 67 REMSAFVQQHDMFVGTMTAREHLQFMARL— RMGDQYYSDHERQLRVEQVLTQMGLKKC 123 

Qy 182 AI)QMIGSYN~FGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLpCMTANQIVLLLAELA 240 

| | : | | | | : | | I : : I : I I : : : I I I : : I I I I : II I I : I I I I 
Db 124 ADTVIGIPNQLKGLSCGEKKRLSFASEILTCPKILFCDEPTSGLDAFMAGHWQALRSLA 183 

Qy 241 RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

11:1 :: I: : I I :: : I MINI : M I 

Db 184 DNGMTVIITIHQPSSHVYSLFNNVCLMACGRVIYLGPGDQAVPLFEKCGYPCPAYYNPAD 243 

Q y 301 FYMDLTSVDTQSREREI ET YKRVQMLECAFKESDI YHKI LENI ERARYLKTLPMV- ^ 355 

: | | : : | : : : I : I : : I I I I : I 

Db 244 HL I RT LAVI D S DRAT SMKT I S KI RQ GFL S T DLGQ S VLA- 1 GNANKLRAAS FVT GS DT 299 

Q y 356 PFKTKDPPGMF-GKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLF LIFYLLR 407 

| : | | : II ::|: : :||:l II "M 
Db 300 SEKTKTFFNQDYNAS FWTQFLALFWRSWLTVI RDPNLLSVRLLQILITAFITGIVFF 356 

Qv 408 VQNNT L KGAVQ DRVGL L YQ L VGAT P YT GMLN AVN L F PML RAVS DQ ESQDGLYHK 461 

| : |::: : :| : || : :: : |: :|:| 

Db 357 -QT PVT PAT 1 1 S I NGIMFN HIRNMNFMLQFPNVPVITAELPIVLRENANGVYRT 409 

Qy 462 WQMLLAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVL 521 

|| : || : | :::::: I I MM : : I : : ' ' : 
Db 410 SAYFLAKNIAELPQYIILPILYNTIVYWMSGLYPNFWNYCFASLWILITNVAISISYAV 469 

Qy 522 LGIVQNPNIVNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEF 581 

| | : : : | : : : : III : I I I : : I I I I : I I : 

Db 470 ATIFANTDVAMTILPIFWPIMAFG-GFFITFDAIPSYFKWLSSLSYFKYGYEALAINEW 528 

Qy 582 YGLNFTCGGSNTSMLNHPMCAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIV 641 

: |:|| : : : : :: : I : MM : I: I 

Db 529 DSIKVI PECFNSSMTAFALDSCPKNGHQVLESI DFSASHKI FDI S I LFGMFI GI RI IAYV 588 



Qy 

Db 



642 IFKVRDY 648 
: I I 

589 ALLIRSY 595 



RESULT 13 
WHIT_LUCCU 

ID WHIT_LUCCU STANDARD; PRT; 677 AA. 

AC Q05360; 

DT 01-FEB-1995 (Rel. 31, Created) 

DT 01-NOV-1997 (Rel. 35, Last sequence update) 

DT 16-OCT-2001 (Rel. 40, Last annotation update) 

DE White protein. 

GN W. 

OS Lucilia cuprina (Greenbottle fly) (Australian sheep blowfly) . 

OC Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; 

OC Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; Oestroidea; 

OC Calliphoridae; Lucilia. 

OX NCBI_TaxID=7375; 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=97087158; PubMed=8 933176; 

RA Garcia R.L., Perkins H.D., Howells A. J. ; 

RT "The structure, sequence and developmental pattern of expression of 

RT the white gene in the blowfly Lucilia cuprina."; 

RL Insect Mol . Biol. 5:251-260(1996). 

RN [2] 

RP SEQUENCE OF 490-584 FROM N.A. 

RX MEDLINE=9 02 64941; PubMed=l 9 71656; 

RA Elizur A., Vacek A.T., Howells A.J.; 

RT "Cloning and characterization of the white and topaz eye color genes 

RT from the sheep blowfly Lucilia cuprina."; 

RL J. Mol. Evol. 30:347-358(1990). 

CC -!- FUNCTION: May be part of a membrane-spanning permease system 
CC necessary for the transport of pigment precursors into pigment 

CC cells responsible for eye color. 

cc _j_ SUBCELLULAR LOCATION: Integral membrane protein. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

cc . 7"" 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 
CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
CC or send an email to license@isb-sib . ch) . 

CC " 

DR EMBL; U38899; AAA82057.1; 

DR EMBL; X53265; CAA37365.1; -. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR005284; Pigment_permease . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR TIGRFAMs; TIGR00955; 3a01204; 1. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

KW Pigment; ATP-binding; Transmembrane; Transport. 

FT NP_BIND 119 126 ATP (POTENTIAL). 

FT TRANSMEM 431 451 POTENTIAL. 

FT TRANSMEM 456 476 POTENTIAL. 



FT TRANSMEM 506 526 POTENTIAL. 

FT TRANSMEM 534 554 POTENTIAL. 

FT TRANSMEM 563 583 POTENTIAL. 

FT TRANSMEM 647 667 POTENTIAL. 

SQ SEQUENCE 677 AA; 75365 MW; D16FC11C97EED51D CRC64; 

Query Match 17.3%; Score 583.5; DB 1; Length 677; 

Best Local Similarity 28.6%; Pred. No. 2e-34; 

Matches 191; Conservative 132; Mismatches 267; Indels 79; Gaps 21; 

Qy 20 GSLSSLEQGSVTGTEARHSL— GVLHVSYS VSNRV-GPWWNIKSCQQKWDRQILK 71 

I I I I I I I : : I I : I I : I I I I : I : I : : : I 

D b 45 GSLVSNESASEKLTYSWCNLDVFGEVHQPGSNWKQLVNRVKGVFCNERHI-PKPRKHLIK 103 

Qy 72 DVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEV — FVNGCELRRDQFQDC 129 

: | | : : : : : I I I I : I I I I I I : I : : I I : : I I : : I 

Db 104 NVCGVAYPGELIAVMGSSGAGKTTLLNALAFRSARGVQISPSSV™LNGHPVT»7VKEMQAR 163 

Qy 130 FS YVLQ S DVFL S S LTVRET LRYTAMLALCRS S ADFYN- KKVEAVMT EL S L S HVADQMI G- 187 

: | | | I : I : I I I I I | : | : : I : :: I : I : : I I I : : I I 

Db 164 CAYVQQDDLFIGSLTAREHLIFQATVRMPRTMTQKQKLQRVDQVIQDLSLIKCQNTIIGV 223 

Qy 18 8 SYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVXLLAELARRDRIVI 247 

| : | | | | : | : : I : : I I I : : : I I I I : I I I I : I : I : I : : I : I I 
Db 224 PGRVKGLSGGERKRLAFASEALTDPPLLICDEPTSGLDSFMAASWQVLKKLSQRGKTVI 283 

Qy 248 VTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTS 307 

: | | | | | I I II : I I I I :: I : I III I : II: I M : II III: : : 
Db 284 LTIHQPSSELFELFDKILLMAEGRVAFLGTPVmVDFFSFIGAQCPTNYNPADFYVQVLA 343 

Qy 308 VDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKT KD~ 361 

| | | || : | : : I : : : : I : : : I I II 

D b 344 V VPGREIESRDRISKICDNFAVGKVSREMEQNFQK IAAKTDGLQKDDE 391 

Qy 362 PPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGA 416 

| : : : I : : : I : I I : I : : : I I : I I I 

D b 392 TTILYKASWFTQFRAIMWRSWISTLKEPLLVKVRLIQTTMVAV-LIGLIFLNQPMTQVG- 449 

Qy 417 VQDRVGLL YQLVGAT P YT GMLN AVN L FPML RAVS DQ E S Q D GL YH KWQML LAYVLHVL P F S 476 

| : | : : : : : : I : I | : I : : I I I III 

Db 450 VMNINGAIFLFLTNMTFQNVFAVINVFTSELPVFMRETRSRLYRCDTYFLGKTLAELPLF 509 

Qy 477 VIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLT LVLLGIVQNPNIVNS 533 

:: :| :: I :|| I : :| :|| I: I - : :: I 

D b 510 LWP FLFIAIAYPMI GLRPGIT HFLSALALVTLVANVSTSFGYLISCASTSTSMALS 566 

Qy 534 IVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGL NFTCGG 590 

: |:| Ml |:| :|: I I II" :l I I — I : : : M 
Db 567 VGPPLTIPFLLFGGVFL-NSGSVPVYFKWLSYFSWFRYANEGLLINQWADVQPGEITCTS 625 

Qy 591 SNT SMLNH PMCAI TQGVQ FI EKT C P — GAT S RFTAN F LI LYGFI PALVILGI VI F 643 

:|| Mil III Ml :ll 1:11 

Db 626 TNT TCPSSGXVXLETLNFRDKFTFRLYG LILLILIF 661 



QY 
Db 



644 KVRDYLISR 652 

: : | : : 
662 RIAGYVAXK 670 



RESULT 14 
WHIT_ANOGA 

ID WHIT_ANOGA STANDARD; PRT; 695 AA. 

AC Q27256; Q17006; 

DT 01-NOV-1997 (Rel- 35, Created) 

DT 01-NOV-1997 (Rel. 35, Last sequence update) 

DT 16-OCT-2001 (Rel. 40, Last annotation update) 

DE White protein. 

GN W. 

OS Anopheles gambiae (African malaria mosquito) . 

OC Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; 

OC Neoptera; Endopterygota; Diptera; Nematocera; Culicoidea; Anopheles . 

OX NCBI_TaxID=7165; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Suakoko / G3; 

RX MEDLINE=96423158; PubMed=8 825759 ; 

RA Besansky N.J., Bedell J. A. , Benedict M.Q., Mukabayire 0., Hilfiker D., 

RA Collins F.H. ; 

RT "Cloning and characterization of the white gene from Anopheles 

RT gambiae."; 

RL Insect Mol. Biol. 4:217-231(1995). 

CC -!- FUNCTION: May be part of a membrane- spanning permease system 
CC necessary for the transport of pigment precursors into pigment 

CC cells responsible for eye color. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; U29486; AAC46995.1; -. 

DR EMBL; U29485; AAC46994.1; -. 

DR EMBL; U29484; AAC47423.1; -. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR008965; Cellul_bind. 

DR InterPro; IPR005284; Pigment_permease . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR TIGRFAMs; TIGR00955; 3a01204; 1. 

DR PROSITE; PS00211; ABC__TRANSP0RTER_1 ; 1. 

DR PROSITE; PS50893; ABC_TRANSP0RTER_2 ; 1. 

KW Pigment; ATP-binding; Transmembrane; Transport. 

FT NP_BIND 133 140 ATP (POTENTIAL) . 

FT NP_BIND 288 295 ATP (POTENTIAL) . 

FT TRANSMEM 444 4 64 POTENTIAL. 

FT TRANSMEM 474 494 POTENTIAL. 

FT TRANSMEM 524 544 POTENTIAL. 



FT 


TRANSMEM 


552 


572 


POTENTIAL. 




FT 


TRANSMEM 


581 


601 


POTENTIAL. 




FT 


TRANSMEM 


669 


689 


POTENTIAL. 




FT 


CARBOHYD 


472 


472 


N-LINKED (GLCNAC. . 


.) (POTENTIAL) 


FT 


CARBOHYD 








) fPOTFNTIAL) 


FT 


CONFLICT 


100 


100 


N -> S (IN REF. 1; 


AAC47423) . 


FT 


CONFLICT 


691 


693 


SRS -> YAR (IN REF. 


1; AAC47423) . 


SQ 


SEQUENCE 


695 AA; 


77218 


MW; EE8B9517239B2 961 


CRC64; 


Query Match 




16. 9* 


h; Score 57 0.5; DB 1; 


Length 695; 



Best Local Similarity 26.9%; Pred. No. 1.8e-33; 
Matches 174; Conservative 121; Mismatches 221; Indels 131; Gaps 20; 

Qy 58 IKSC — QQKWD RQI LKDVSLYI ESGQIMCI LGS SGSGKTTLLDAI SGRLRRTGTLE 111 

::: | : | : | : : I I : I : : I I : : : : : I I I I : I I I I I I : I : : I : 

D b 98 LRN C CT RQRKD FN P RKHLLKN VT GVAK S GE LLAVMG S S GAGKTT L LNALAFRS P P GVK I S 157 

Qy 112 GEVF — VNGCELRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRS-SADFYNKK 168 

: | | : : I : : I I I I : I : I I I I I I : I I I : I I : 
Db 158 PNAVRALNGVPVNAEQLRARCAYVQQDDLFIPSLTTREHLLFQAMLRMGRDVPASVKQHR 217 

Qy 169 VEAVMTELSLSHVAI)QMIGS-YNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCM 227 

| : | : | | | I || : I I : I : I I I I : I : : I : : I I I : : : I I M : I I I 

Db 218 VQEVLQELSLVKCADTIIGAPGRIKGLSGGERKRLAFASETLTDPHLLLCDEPTSGLDSF 277 

Qy 22 8 TANQIVLLLAELARRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNN 287 

| : :: : I : I : : : I : I I I I I I I I : MM:: | : I I : I : II : 
Db 278 MAHS VLQVLKGMAMKGKT 1 1 LT I HQP S SEL YCLFDKI LLVAEGRVAFLGS P YQSAEFFSQ 337 

Qy 288 CGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERAR 347 

| M I : I I M I : : : : : Ml : : : M II : : I I 
Db 338 LGIPCPPNYN PAD F YVQMLAI APAKEAECRDMI KKI CDS FAVS PI AREVLETAS 391 

Qy 348 YLKTLPMVPFKTKDPPGMF GKLG VLLRRVTRNLMRNKQAVIMRL 391 

I I I I I II Ml ::=:: I Ml 

Db 392 VAGKGMDEPYMLQQVEGVGSTGYRSSWWTQFYCILWRSWLSVLKDPMLVKVRL 444 

Qy 392 VQNLIMGLFLI FYLLRVQNNTLKGAV QDRV GLL YQLVGAT P YT GMLNAV 440 

: | : : III:: I I I I I : : : : : 

Db 445 LQTAMVA T L I G S I Y FGQ VLDQ D GVMN I N G S L FL FLTNMT FQN VFAVI 491 

Qy 441 NLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARF 500 

|:| I M : II I : II : MM: I Ml I 
Db 4 92 NVFSAELPVFLREKRSRLYRVDTYFLGKTIAELPLFIAVPFVFTSITYPMIGL RT 54 6 



Qy 



501 GYFSAALLAPHLI GEFLTLVLLGIVQNPN IVNSIVALLSIS GLLIG 546 

| ||: II :: M I : Ml II: M 

Db 547 G ATHYL TTLFIVTLVANVSTSFGYLISCASSSISMALSVGPPWIPFLIF 596 



Qy 547 SGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGL NFTCGGSNTSMLNHPMCAI 603 

II I M I I I :: M I I : M : : : I M 
Db 597 GGFFLNSASVPAYFKYLSYLSWFRYANEALLINQWSTWDGEIACTRANV 64 6 

Qy 604 TQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVIFKVRDYLI 650 

|||: Ml I M I : : 

Db 647 TCPRSE IILETFNFRVEDFAL 667 



RESULT 15 
ABG4_HUMAN 

ID ABG4_HUMAN STANDARD; PRT; 646 AA. 

AC Q9H172; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 4. 

GN ABCG4 OR WHITE2 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID=9606; 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=21518231; PubMed=11606068 ; 

RA Engel T. , Lorkowski S., Lueken A., Rust S., Schlueter B., Berger G. f 

RA Cullen P., Assmann G.; 

RT "The human ABCG4 gene is regulated by oxysterols and retinoids in 

RT monocyte-derived macrophages . " ; 

RL Biochem. Biophys. Res. Commun. 288:483-488(2001). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Brain; 

RX MEDLINE=22388257; PubMed=12477932 ; 

RA Strausberg R.L., Feingold E.A., Grouse L.H., Derge J.G., 

RA Klausner R.D., Collins F.S., Wagner L., Shenmen CM., Schuler G.D., 

RA Altschul S.F., Zeeberg B. , Buetow K.H., Schaefer C.F., Bhat N.K., 

RA Hopkins R.F., Jordan H., Moore T., Max S.I., Wang J., Hsieh F. , 

RA Diatchenko L. , Marusina K., Farmer A. A., Rubin G.M., Hong L., 

RA Stapleton M. , Soares M.B., Bonaldo M.F., Casavant T.L., Scheetz T.E 

RA Brownstein M.J., Usdin T.B., Toshiyuki S., Carninci P., Prange C, 

RA Raha S.S., Loquellano N.A. , Peters G.J., Abramson R.D., Mullahy S.J 

RA Bosak S.A., McEwan P.J., McKernan K.J., Malek J. A., Gunaratne P.H., 

RA Richards S., Worley K.C., Hale S., Garcia A.M., Gay L.J., Hulyk S.W 

RA Villalon D.K., Muzny D.M., Sodergren E.J., Lu X., Gibbs R.A. , 

RA Fahey J., Helton E., Ketteman M. , Madan A., Rodrigues S., Sanchez A 

RA Whiting M., Madan A., Young A.C., Shevchenko Y., Bouffard G.G., 

RA Blakesley R.W., Touchman J.W., Green E.D., Dickson M.C., 

RA Rodriguez A.C., Grimwood J., Schmutz J., Myers R.M., 

RA Butterfield Y.S.N., Krzywinski M.I., Skalska U., Smailus D.E., 

RA Schnerch A., Schein J.E., Jones S.J.M., Marra M.A.; 

RT "Generation and initial analysis of more than 15,000 full-length 

RT human and mouse cDNA sequences."; 

RL Proc. Natl. Acad. Sci. U.S.A. 99:16899-16903(2002). 

RN [3] 

RP SEQUENCE OF 20-646 FROM N.A. 

RC TISSUE^Dorsal root ganglion; 

RX MEDLINE=22170423; PubMed=12 183068 ; 

RA Oldfield S., Lowry C, Ruddick J., Lightman S.; 

RT "ABCG4: a novel human white family ABC-transporter expressed in the 

RT brain and eye."; 

RL Biochim. Biophys. Acta 1591:175-179(2002). 

CC -!- FUNCTION: May be involved in macrophage lipid homeostasis. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 



CC subfamily. 

CC T~~ 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL ; AJ308237; CAC87131.1; -. 

DR EMBL; BC041091; AAH41091.1; 

DR EMBL; AJ300465; CAC17140.1; 

DR PIR; JC7777; JC7777. 

DR Genew; HGNC: 13884; ABCG4 . 

DR MIM; 607784; 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

KW ATP-binding; Glycoprotein; Transmembrane; Transport. 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
SQ 

Query Match 16.9%; Score 569.5; DB 1; Length 646; 

Best Local Similarity 26.0%; Pred. No. 1.9e-33; 

Matches 174; Conservative 138; Mismatches 273; Indels 83; Gaps 19; 

Qy 10 EGARGP HINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKSCQQKW 65 

: I I I |: : I : I :: : I I I I II I — 
Db 26 DGAEPPVLTTHLKKVENHITEAQRFSHLPKRSAVDIEFVELSYSVREGPCW RKRG 80 

Q y 66 DRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNG CELRR 123 

: :|| :| : : : I : I I I : I I : I : : : : I I :| ::|:: III III 

Db 81 YKTLLKCLSGKFCRRELIGIMGPSGAGKSTFMNILAG-YRESG-MKGQILVNGRPRELRT 138 

Qy 124 DQFQDCFSYVljQSDVFLSSLTVRETLRYTAMLALCRSSADF SH 18 0 

: | |::| |: | III I : =1 I I : : I —III II 

Db 139 FRKMSC— YIMQDDMLLPHLTVLEAMMVSANLKLSEKQ-EVKKELVTEILTALGLMSCSH 195 



DOMAIN 


1 


393 


CYTOPLASMIC (POTENTIAL) . 


TRANS MEM 


394 


414 


1 (POTENTIAL) . 


DOMAIN 


415 


425 


EXTRACELLULAR (POTENTIAL) . 


TRANS MEM 


426 


446 


2 (POTENTIAL) . 


DOMAIN 


447 


472 


CYTOPLASMIC (POTENTIAL) . 


TRANSMEM 


473 


493 


3 (POTENTIAL) . 


DOMAIN 


494 


503 


EXTRACELLULAR (POTENTIAL) . 


TRANSMEM 


504 


524 


4 (POTENTIAL) . 


DOMAIN 


525 


532 


CYTOPLASMIC (POTENTIAL) . 


TRANSMEM 


533 


553 


5 (POTENTIAL) . 


DOMAIN 


554 


617 


EXTRACELLULAR (POTENTIAL). 


TRANSMEM 


618 


638 


6 (POTENTIAL) . 


DOMAIN 


639 


646' 


CYTOPLASMIC (POTENTIAL) . 


NP_BIND 


102 


109 


ATP (POTENTIAL) . 


CARBOHYD 


422 


422 


N-LINKED (GLCNAC. . .) (PC 


SEQUENCE 


646 AA; 


71895 


MW; 9CCEC6E150772611 CRC64 ; 



Qy 181 VADQMI GS YNFGGI S SGERRRVS I AAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELA 240 

: : | | : | : | : : | | : | : : | II I M I : I I I : I • I I : I I 

Db 196 TRTAL LSGGQRKRLAIALELVNNPPVMFFDEPTSGLDSASCFQWSLMKSLA 247 

Qy 241 RRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

: | : | | | | I I : : I I : I I I : I I : I : : I I = : = I I I : I I I 
Db 248 QGGRTIICTIHQPSAKLFEMFDKLYILSQGQCIFKGWTNLIPYLKGLGLHCPTYHNPAD 307 

Qy 301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 

| ::: | : : :: I I I I I : h I I : 

D b 308 FIIEVASGEYGDLNPML — FRAVQNGLCAMAEKK SSPEKNEVPAPCPPCPPEV- 358 

Qy 361 DP PGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTL 413 

|| : : I : I : : : I : : | : : : : : | : : I : : : 

Db 359 DPIESHTFATSTLTQFCILFKRTFLSILRDTVLTHLRFMSHWIGVLIGLLYLHIGDDAS 418 

Qy 414 KGAVQ DRVGLL YQLVGAT P YT GMLN AVN L FPMLRAVS DQ E S Q DGL YHKWQML LAYVLHVL 473 

| | : | | : : : : : I I I : I I : I : I I I : : 

Db 419 k — VFNNTGCLFFSMLFLMFAALMPTVLTFPLEMAVFMREHLNYWYSLKAYYLAKTMADV 476 

Qy 474 PFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNS 533 

|||: I :: I : I I I I : I I III I I : : I I I = I I I : 
Db 477 P FQWC PWYC S I VYWMT GQ PAET S RFLL FS ALATATALVAQ S LGL- L I GAASN S LQVAT 535 

Qy 534 IVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGL— NFTCGG 590 

| : : : I : I I I : : : I I : | : : : I I : : : I I : : I I 
Db 536 FVGPVTAI PVLLFS GFFVS FKT I PT YLQWS S YLS YVRYGFEGVI LT - 1 YGMERGDLTC 592 

q v 591 SNT SMLNHPMCAI TQGVQFI EKTC P GAT S RFTANFLI LYGFI PAL VI L 638 

:|: || :: :||:| I II :| 

Db 593 LEERCP FREPQS I LRALDVEDAKL YMDFLVLGI FFLALRLL 633 

Qy 639 GIVIFKVR 646 

Db 634 AYLVLRYR 641 



Search completed: February 27, 2004, 07:12:35 
Job time : 12.0952 sees 



